Posts

EA is about maximization, and maximization is perilous 2022-09-02T17:13:52.226Z
How might we align transformative AI if it’s developed very soon? 2022-08-29T15:48:13.744Z
AI strategy nearcasting 2022-08-26T16:25:24.411Z
The Track Record of Futurists Seems ... Fine 2022-07-04T15:47:16.231Z
Nonprofit Boards are Weird 2022-06-28T10:17:52.343Z
AI Could Defeat All Of Us Combined 2022-06-10T23:25:51.238Z
Useful Vices for Wicked Problems 2022-04-12T19:19:42.902Z
Ideal governance (for companies, countries and more) 2022-04-07T16:54:02.748Z
The Wicked Problem Experience 2022-03-02T17:43:00.747Z
Important, actionable research questions for the most important century 2022-02-24T16:34:29.061Z
Learning By Writing 2022-02-22T15:39:53.803Z
Future-proof ethics 2022-02-02T19:40:24.811Z
Other-centered ethics and Harsanyi's Aggregation Theorem 2022-02-02T03:21:46.054Z
Consider trying the ELK contest (I am) 2022-01-05T19:42:11.106Z
Bayesian Mindset 2021-12-21T19:54:28.391Z
Rowing, Steering, Anchoring, Equity, Mutiny 2021-11-30T21:11:46.378Z
Minimal-trust investigations 2021-11-23T18:02:46.511Z
“Biological anchors” is about bounding, not pinpointing, AI timelines 2021-11-18T21:03:56.695Z
Weak point in "most important century": lock-in 2021-11-11T22:02:44.392Z
Comments for shorter Cold Takes pieces 2021-11-03T12:48:41.645Z
Has Life Gotten Better? 2021-10-05T08:31:40.696Z
Summary of history (empowerment and well-being lens) 2021-09-28T17:48:19.417Z
Call to Vigilance 2021-09-15T18:46:09.068Z
How to make the best of the most important century? 2021-09-14T21:05:57.096Z
AI Timelines: Where the Arguments, and the "Experts," Stand 2021-09-07T17:35:12.431Z
The Most Important Century: Sequence Introduction 2021-09-03T08:10:53.657Z
Forecasting transformative AI: the "biological anchors" method in a nutshell 2021-08-31T18:17:03.013Z
Forecasting Transformative AI: Are we "trending toward" transformative AI? (How would we know?) 2021-08-24T17:15:18.742Z
Forecasting transformative AI: what's the burden of proof? 2021-08-17T17:14:37.482Z
Forecasting Transformative AI: What Kind of AI? 2021-08-10T21:38:46.178Z
This Can't Go On 2021-08-03T15:53:33.837Z
Digital People FAQ 2021-07-27T17:19:59.605Z
Digital People Would Be An Even Bigger Deal 2021-07-27T17:19:41.500Z
The Duplicator: Instant Cloning Would Make the World Economy Explode 2021-07-20T16:41:42.011Z
New blog: Cold Takes 2021-07-13T17:14:33.220Z
All Possible Views About Humanity's Future Are Wild 2021-07-13T16:57:28.414Z
My current impressions on career choice for longtermists 2021-06-04T17:07:29.979Z
History of Philanthropy Literature Review: Pugwash Conferences on Science and World Affairs 2019-04-18T08:52:22.827Z
History of Philanthropy Case Study: The Campaign for Marriage Equality 2018-09-20T08:58:12.938Z
Hi, I'm Holden Karnofsky. AMA about jobs at Open Philanthropy 2018-03-26T16:32:28.313Z
Update on Cause Prioritization at Open Philanthropy 2018-01-26T16:40:08.648Z
History of Philanthropy Case Study: Clinton Health Access Initiative’s Role in Global Price Drops for Antiretroviral Drugs 2018-01-10T16:44:28.840Z
Projects, People and Processes 2017-06-26T13:27:27.030Z
Some Thoughts on Public Discourse 2017-02-23T17:29:09.085Z
Radical Empathy 2017-02-16T12:41:39.017Z
Worldview Diversification 2016-12-13T14:15:41.581Z
Three Key Issues I’ve Changed My Mind About 2016-09-06T13:10:36.207Z
History of Philanthropy Case Study: The Founding of the Center for Global Development 2016-06-15T15:48:22.913Z
History of Philanthropy Case Study: The Founding of the Center on Budget and Policy Priorities 2016-05-20T15:30:25.853Z
Some Background on Open Philanthropy's Views Regarding Advanced Artificial Intelligence 2016-05-16T13:08:03.353Z

Comments

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-04-01T03:03:44.030Z · EA · GW

Hm. I contacted Nick and replaced it with another link - does that work?

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-03-31T23:10:08.822Z · EA · GW

I didn't make a claim that constant replacement occurs "empirically." As far as I can tell, it's not possible to empirically test whether it does or not. I think we are left deciding whether we choose to think of ourselves as being constantly replaced, or not - either choice won't contradict any empirical observations. My post was pointing out that if one does choose to think of things that way, a lot of other paradoxes seem to go away.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-03-31T23:09:52.574Z · EA · GW

I personally like Radiohead a lot, but I don't feel like my subjective opinions are generally important here; with Pet Sounds I tried to focus on what seemed like an unusually clear-cut case (not that the album has nothing interesting going on, but that it's an odd choice for #1 of all time, especially in light of coming out a year after A Love Supreme).

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-03-31T23:09:29.469Z · EA · GW

I think this is interesting and plausible, but I'm somewhat skeptical in light of the fact that there doesn't seem to have been much (or at least, very effective) outcry over the rollback of net neutrality.

Comment by Holden Karnofsky (HoldenKarnofsky) on The Wicked Problem Experience · 2022-03-31T23:09:09.905Z · EA · GW

I think this is often a good approach!

Comment by Holden Karnofsky (HoldenKarnofsky) on Important, actionable research questions for the most important century · 2022-03-31T23:08:51.228Z · EA · GW

I think "people can test their fit without much experience, but would get lots of value out of that experience for actually doing this work" is pretty valid, though I'll also comment that I think there are diminishing returns to direct experience - I think getting some experience (or at least exposure, e.g. via conversation with insiders) is important, but I don't think one necessarily needs several years inside key institutions in order to be helpful on problems like these.

Comment by Holden Karnofsky (HoldenKarnofsky) on Important, actionable research questions for the most important century · 2022-03-31T23:08:28.215Z · EA · GW

I don't have anything available for this offhand - I'd have to put serious thought into what questions are at the most productive intersection of "resolvable", "a good fit for Metaculus" and "capturing something important." Something about warning signs ("will an AI system steal at least $10 million?") could be good.

Comment by Holden Karnofsky (HoldenKarnofsky) on Consider trying the ELK contest (I am) · 2022-03-31T22:59:22.063Z · EA · GW

Thanks! I'd estimate another 10-15 hours on top of the above, so 20-30 hours total. A good amount of this felt like leisure time and could be done while not in front of a computer, which was nice. I didn't end up with "solutions" I'd be actually excited about for substantive progress on alignment, but I think I accomplished my goal of understanding the ELK writeup well enough to nitpick it.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:37:59.613Z · EA · GW

The link works for me in incognito mode (it is a Google Drive file).

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:37:42.029Z · EA · GW

Thanks, this is helpful! I wasn't aware of that usage of "moral quasi-realism."

Personally, I find the question of whether principles can be described as "true" unimportant, and don't have much of a take on it. My default take is that it's convenient to sometimes use "true" in this way, so I sometimes do, while being happy to taboo it anytime someone wants me to or I otherwise think it would be helpful to.


 

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:34:49.991Z · EA · GW

I share a number of your intuitions as a starting point, but this dialogue (and previous ones) is intended to pose challenges to those intuitions. To follow up on those:

On Challenge 1A (and as a more general point) - if we take action against climate change, that presumably means making some sort of sacrifice today for the sake of future generations. Does your position imply that this is "simply better for some and worse for others, and not better or worse on the whole?" Does that imply that it is not particularly good or bad to take action on climate change, such that we may as well do what's best for our own generation?

Also on Challenge 1A - under your model, who specifically are the people it is "better for" to take action on climate change, if we presume that the set of people that exists conditional on taking action is completely distinct from the set of people that exists conditional on not taking action (due to chaotic effects as discussed in the dialogue)?

On Challenge 1B, are you saying there is no answer to how to ethically choose between those two worlds, if one is simply presented with a choice?

On Challenge 2, does your position imply that it is wrong to bring someone into existence, because there is a risk that they will suffer greatly (which will mean they've been wronged), and no way to "offset" this potential wrong?

Non-utilitarian Holden has a lot of consequentialist intuitions that he ideally would like to accommodate, but is not all-in on consequentialism.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:34:24.690Z · EA · GW

I think that's a fair point. These positions just pretty much end up in the same place when it comes to valuing existential risk.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:34:10.331Z · EA · GW That seems reasonable re: sentientism. I agree that there's no knockdown argument against lexicographic preferences, though I find them unappealing for reasons gestured at in this dialogue.
Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:33:28.195Z · EA · GW

It's interesting that you have that intuition! I don't share it, and I think the intuition somewhat implies some of the "You shouldn't leave your house" type things alluded to in the dialogue.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:33:11.298Z · EA · GW

I agree with this argument for discount rates, but I think it is a practical rather than philosophical argument. That is, I don't think it undermines the idea that if we were to avert extinction, all of the future lives thereby enabled should be given "full weight."

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:32:55.521Z · EA · GW

You're right that I haven't comprehensively addressed risk aversion in this piece. I've just tried to give an intuition for why the pro-risk-aversion intuition might be misleading.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:32:40.241Z · EA · GW I appreciate Kenny's comments pointing toward potentially relevant literature, and agree that you could be a utilitarian without fully biting this bullet ... but as far as I can tell, attempts to do so have enough weird consequences of their own that I'd rather just bite the bullet. This dialogue gives some of the intuition for being skeptical of some things being infinitely more valuable than others.
Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:32:19.110Z · EA · GW I think you lose a lot when you give up additivity, as discussed here and here.
Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:31:56.805Z · EA · GW I think I agree with Tyler. Also see this follow-up piece - "future-proof" is supposed to mean "would still look good if we made progress, whatever that is." This is largely supposed to be a somewhat moral-realism-agnostic operationalization of what it means for object-level arguments to be right.
Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-31T22:31:29.534Z · EA · GW I don't think we should assume future ethics are better than ours, and that's not the intent of the term. I discuss what I was trying to do more here.
Comment by Holden Karnofsky (HoldenKarnofsky) on Other-centered ethics and Harsanyi's Aggregation Theorem · 2022-03-31T22:30:58.277Z · EA · GW

Good point, thanks! Edited.

Comment by Holden Karnofsky (HoldenKarnofsky) on Other-centered ethics and Harsanyi's Aggregation Theorem · 2022-03-31T22:30:25.286Z · EA · GW

Thanks, this is appreciated!

Comment by Holden Karnofsky (HoldenKarnofsky) on “Biological anchors” is about bounding, not pinpointing, AI timelines · 2022-03-31T21:04:25.947Z · EA · GW

Sorry for the long delay, I let a lot of comments to respond to pile up!

APS seems like a category of systems that includes some of the others you listed (“Advanced capability: they outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, and persuasion/manipulation) … “). I still don’t feel clear on what you have in mind here in terms of specific transformative capabilities. If we condition on not having extreme capabilities for persuasion or research/engineering, I’m quite skeptical that something in the "business/military/political strategy" category is a great candidate to have transformative impact on its own.

Thanks for the links re: persuasion! This seems like a major theme for you and a big place where we currently disagree. I'm not sure what to make of your take, and I think I'd have to think a lot more to have stable views on it, but here are quick reactions:

  • If we made a chart of some number capturing "how easy it is to convince key parts of society to recognize and navigate a tricky novel problem" (which I'll abbreviate as "epistemic responsiveness") since the dawn of civilization, what would that chart look like? My guess is that it would be pretty chaotic; that it would sometimes go quite low and sometime sgo quite high; and that it would be very hard to predict the impact of a given technology or other development on epistemic responsiveness. Maybe there have been one-off points in history when epistemic responsiveness was very high; maybe it is much lower today compared to peak, such that someone could already claim we have passed the "point of no return"; maybe "persuasion AI" will drive it lower or higher, depending partly on who you think will have access to the biggest and best persuasion AIs and how they will use them. So I think even if we grant a lot of your views about how much AI could change the "memetic environment," it's not clear how this relates to the "point of no return."
  • I think I feel a lot less impressed/scared than you with respect to today's "persuasion techniques."
    • I'd be interested in seeing literature on how big an effect size you can get out of things like focus groups and A/B testing. My guess is that going from completely incompetent at persuasion (e.g., basically modeling your audience as yourself, which is where most people start) to "empirically understanding and incorporating your audience's different-from-you characteristics" causes a big jump from a very low level of effectiveness, but that things flatten out quickly after that, and that pouring more effort into focus groups and testing leads to only moderate effects, such that "doubling effectiveness" on the margin shouldn't be a very impressive/scary idea.
    • I think most media is optimizing for engagement rather than persuasion, and that it's natural for things to continue this way as AI advances. Engagement is dramatically easier to measure than persuasion, so data-hungry AI should help more with engagement than persuasion; targeting engagement is in some sense "self-reinforcing" and "self-funding" in a way that targeting persuasion isn't (so persuasion targeters need some sort of subsidy to compete with engagement targeters); and there are norms against targeting persuasion as well. I do expect some people and institutions to invest a lot in persuasion targeting (as they do today), but my modal expectation does not involve it becoming pervasive on nearly all websites, the way yours seems to.
    • I feel like a lot of today's "persuasion" is either (a) extremely immersive (someone is raised in a social setting that is very committed to some set of views or practices); or (b) involves persuading previously-close-to-indifferent people to believe things that call for low-cost actions (in many cases this means voting and social media posting; in some cases it can mean more consequential, but still ultimately not-super-high-personal-cost, actions). (b) can lead over time to shifting coalitions and identities, but the transition from (b) to (a) seems long.
    • I particularly don't feel that today's "persuaders" have much ability to accomplish the things that you're pointing to with "chatbots," "coaches," "Imperius curses" and "drugs." (Are there cases of drugs being used to systematically cause people to make durable, sustained, action-relevant changes to their views, especially when not accompanied by broader social immersion?)
  • I'm not really all that sure what the special role of AI is here, if we assume (for the sake of your argument that AI need not do other things to be transformative or PONR-y) a lack of scientific/engineering ability. What has/had higher ex ante probability of leading to a dramatic change in the memetic environment: further development of AI language models that could be used to write more propaganda, or the recent (last 20 years) explosion in communication channels and data, or many other changes over the last few hundred years such as the advent of radio and television, or the change in business models for media that we're living through now? This comparison is intended to be an argument both that "your kind of reasoning would've led us to expect many previous persuasion-related PONRs without needing special AI advances" and that "if we condition on persuasion-related PONRs being the big thing to think about, we shouldn't necessarily be all that focused on AI."

I liked the story you wrote! A lot of it seems reasonably likely to be reasonably on point to me - I especially liked your bits about AIs confusing people when asked about their internal lives. However:

  • I think the story is missing a kind of quantification or "quantified attitude" that seems important if we want to be talking about whether this story playing out "would mean we're probably looking at transformative/PONR-AI in the following five years." For example, I do expect progress in digital assistants, but it matters an awful lot how much progress and economic impact there is. Same goes for just how effective the "pervasive persuasion targeting" is. I think this story could be consistent with worlds in which I've updated a lot toward shorter transformative AI timelines, and with worlds in which I haven't at all (or have updated toward longer ones.)
  • As my comments probably indicate, I'm not sold on this section.
    • I'll be pretty surprised if e.g. the NYT is using a lot of persuasion targeting, as opposed to engagement targeting.
    • I do expect "People who still remember 2021 think of it as the golden days, when conformism and censorship and polarization were noticeably less than they are now" will be true, but that's primarily because (a) I think people are just really quick to hallucinate declinist dynamics and call past times "golden ages"; (b) 2021 does seem to have extremely little conformism and censorship (and basically normal polarization) by historical standards, and actually does kinda seem like a sort of epistemic golden age to me.
      • For people who are strongly and genuinely interested in understanding the world, I think we are in the midst of an explosion in useful websites, tools, and blogs that will someday be seen nostalgically;* a number of these websites/tools/blogs are remarkably influential among powerful people; and while most people are taking a lot less advantage than they could and seem to have pretty poorly epistemically grounded views, I'm extremely unconvinced that things looked better on this front in the past - here's one post on that topic.

I do generally think that persuasion is an underexplored topic, and could have many implications for transformative AI strategy. Such implications could include something like "Today's data explosion is already causing dramatic improvements in the ability of websites and other media to convince people of arbitrary things; we should assign a reasonably high probability that language models will further speed this in a way that transforms the world." That just isn't my guess at the moment.

*To be clear, I don't think this will be because websites/tools/blogs will be less useful in the future. I just think people will be more impressed with those of our time, which are picking a lot of low-hanging fruit in terms of improving on the status quo, so they'll feel impressive to read while knowing that the points they were making were novel at the time.

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-03-29T18:07:47.935Z · EA · GW

Comments on Debating myself on whether “extra lives lived” are as good as “deaths prevented” will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on The Wicked Problem Experience · 2022-03-02T23:40:30.346Z · EA · GW

Fixed, thanks!

Comment by Holden Karnofsky (HoldenKarnofsky) on Future-proof ethics · 2022-02-15T17:15:59.860Z · EA · GW

Comments on Defending One-Dimensional Ethics will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-02-03T05:57:21.933Z · EA · GW

(Placeholder for comments on "To Match the Greats, Don’t Follow In Their Footsteps")

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-02-03T03:10:25.992Z · EA · GW

Placeholder for comments on Beach Boys post

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-25T23:52:31.842Z · EA · GW

Comments for Cost disease and civilizational decline will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-25T18:35:28.147Z · EA · GW

Comments on Reader reactions and update on "Where's Today's Beethoven" will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-19T23:46:04.186Z · EA · GW

Comments for Book non-review: The Dawn of Everything will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on The Duplicator: Instant Cloning Would Make the World Economy Explode · 2022-01-18T20:46:27.100Z · EA · GW

The latter.

Comment by Holden Karnofsky (HoldenKarnofsky) on Forecasting transformative AI: the "biological anchors" method in a nutshell · 2022-01-18T20:46:05.113Z · EA · GW

This version of the mice analogy was better than mine, thanks!

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-18T20:45:44.397Z · EA · GW

I largely agree with this comment, and I didn't mean to say that different intellectual property norms would create more "Beethoven-like" figures critical-acclaim-wise. I more meant to say it would just be very beneficial to consumers. (And I do think music is in a noticeably better state (w/r/t the ease of finding a lot that one really likes) than film or books, though this could be for a number of reasons.)

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-18T20:45:06.331Z · EA · GW

Sorry, just saw this! This did not in fact work out on the hoped-for timeline, and I didn't have a grantee in mind - I think the right way to try to do something here would've been through direct dialogue with policymakers.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-18T20:44:41.336Z · EA · GW

In response to the paragraph starting "I see how ..." (which I can't copy-paste easily due to the subscripts):

I think there are good pragmatic arguments for taking actions that effectively hold Ht responsible for the actions of Ht-1. For example, if Ht-1 committed premeditated murder, this gives some argument that Ht is more likely to harm others than the average person, and should be accordingly restricted for their benefit. And it's possible that the general practice of punishing Ht for Ht-1's actions would generally deter crime, while not creating other perverse effects (more effectively than punishing someone else for Ht-1's actions).

In my view, that's enough - I generally don't buy into the idea that there is something fundamental to the idea of "what people deserve" beyond something like "how people should be treated as part of the functioning of a healthy society."

But if I didn't hold this view, I could still just insist on splitting the idea of "the same person" into two different things: it seems coherent to say that Ht-1 and Ht are the same person in one sense and different people in another sense. My main claim is that "myself 1 second from now" and "myself now" are different people in the same sense that "a copy of myself created on another planet" and "myself" are different people; we could simultaneously say that both pairs can be called the "same person" in a different sense, one used for responsibility. (And indeed, it does seem reasonable to me that a copy would be held responsible for actions that the original took before "forking.")

Comment by Holden Karnofsky (HoldenKarnofsky) on Rowing, Steering, Anchoring, Equity, Mutiny · 2022-01-18T20:42:00.113Z · EA · GW

I think this is a good point, but it doesn't totally knock me out of feeling sympathy for the "rowing" case.

It looks quite likely to me that factory farming is going to end up looking something like air pollution - something that got worse, then better, as capabilities/wealth improved. I expect the combination of improving "animal product alternatives" (Impossible, Beyond, eventually clean meat) with increasing wealth to lead this way.

Granted, this is no longer a "pure trend extrapolation," but I think the consistent and somewhat mysterious improvement in the lives of humans (the population that has been getting more empowered/capable) is still a major part of a case I have a lot of sympathy for: that by default, at least over the next few decades and bracketing some "table-flip" scenarios, we should expect further economic growth and technological advancement to result in better quality of life.

Comment by Holden Karnofsky (HoldenKarnofsky) on “Biological anchors” is about bounding, not pinpointing, AI timelines · 2022-01-18T20:35:10.573Z · EA · GW

Interesting, thanks! Yep, those probabilities definitely seem too high to me :) How much would you shade them down for 5 years instead of 15? It seems like if your 5-year probabilities are anywhere near your 15-year probabilities, then the next 5 years have a lot of potential to update you one way or the other (e.g., if none of the "paths to PONR" you're describing work out in that time, that seems like it should be a significant update).

I'm not going to comment comprehensively on the paths you laid out, but a few things:

  • I think EfficientZero is sample-efficient but not compute-efficient: it's compensating for its small number of data points by simulating a large number, and I don't think there are big surprises on how much compute it's using to do that. This doesn't to be competing with human "efficiency" in the most important (e.g., compute costs) sense.

  • I don't know what you mean by APS-AI.

  • I'm pretty skeptical that "Persuasion tools good enough to cause major ideological strife and/or major degradation of public epistemology" is a serious PONR candidate. (There's already a lot of ideological strife and public confusion ...) I think the level of persuasiveness needed here would need to be incredibly extreme - far beyond "can build a QAnon-like following" and more like "Can get more than half the population to take whatever actions one wants them to take." This probably requires reasoning about neuroscience or something, and doesn't seem to me to be adding much in the way of independent possibility relative to the R&D possibility.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-18T19:56:00.975Z · EA · GW

I generally put this comment up in advance of the post, so that I can link to it from the post. The post is up now!

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-18T04:24:04.464Z · EA · GW

Comments for Empowerment and Stakeholder Management will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-14T20:53:10.969Z · EA · GW

Comments for Jan. 14 Cold Links will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-11T03:09:40.200Z · EA · GW

[Placeholder for Why it matters if "ideas are getting harder to find" comments]

Comment by Holden Karnofsky (HoldenKarnofsky) on Consider trying the ELK contest (I am) · 2022-01-07T00:21:04.439Z · EA · GW

Hey Josh, I think this is a good point - it would be great to have some common knowledge of what sort of commitment this is.

Here's where I am so far:

  1. I read through the full report reasonably carefully (but only some of the appendices).

  2. I spent some time thinking about potential counterexamples. It's hard to say how much; this mostly wasn't time I carved out, more something I was thinking about while taking a walk or something.

  3. At times I would reread specific parts of the writeup that seemed important for thinking about whether a particular idea was viable. I wrote up one batch of rough ideas for ARC and got feedback on it.

I would guess that I spent several hours on #1, several hours on #2, and maybe another 2-3 hours on #3. So maybe something like 10-15 hours so far?

At this point I don't think I'm clearly on track to come up with anything that qualifies for a prize, but I think I understand the problem pretty well and why it's hard for me to think of solutions. If I fail to submit a successful entry, I think it will feel more like "I saw what was hard about this and wasn't able to overcome it" than like "I tried a bunch of random stuff, lacking understanding of the challenge, and none of it worked out." This is the main benefit that I wanted.

My background might unfortunately be hard to make much sense of, in terms of how it compares to someone else's. I have next to no formal technical education, but I have spent tons of time talking about AI timelines and AI safety, including with Paul (the head of ARC), and that has included asking questions and reading things about the aspects of machine learning I felt were important for these conversations. (I never wrote my own code or read through a textbook, though I did read Michael Nielsen's guide to neural networks a while ago.) My subjective feeling was that the ELK writeup didn't have a lot of prerequisites - mostly just a very basic understanding of what deep learning is about, and a vague understanding of what a Bayes net is. But I can't be confident in that. (In particular, Bayes nets are generally only used to make examples concrete, and I was generally pretty fine to just go with my rough impression of what was going on; I sometimes found the more detailed appendices, with pseudocode and a Conway's Game of Life analogy, clearer than the Bayes net diagrams anyway.)

Comment by Holden Karnofsky (HoldenKarnofsky) on Consider trying the ELK contest (I am) · 2022-01-06T21:20:06.614Z · EA · GW

Hey Josh, I think this is a good point - it would be great to have some common knowledge of what sort of commitment this is.

Here's where I am so far:

  1. I read through the full report reasonably carefully (but only some of the appendices).

  2. I spent some time thinking about potential counterexamples. It's hard to say how much; this mostly wasn't time I carved out, more something I was thinking about while taking a walk or something.

  3. At times I would reread specific parts of the writeup that seemed important for thinking about whether a particular idea was viable. I wrote up one batch of rough ideas for ARC and got feedback on it.

I would guess that I spent several hours on #1, several hours on #2, and maybe another 2-3 hours on #3. So maybe something like 10-15 hours so far?

At this point I don't think I'm clearly on track to come up with anything that qualifies for a prize, but I think I understand the problem pretty well and why it's hard for me to think of solutions. If I fail to submit a successful entry, I think it will feel more like "I saw what was hard about this and wasn't able to overcome it" than like "I tried a bunch of random stuff, lacking understanding of the challenge, and none of it worked out." This is the main benefit that I wanted.

My background might unfortunately be hard to make much sense of, in terms of how it compares to someone else's. I have next to no formal technical education, but I have spent tons of time talking about AI timelines and AI safety, including with Paul (the head of ARC), and that has included asking questions and reading things about the aspects of machine learning I felt were important for these conversations. (I never wrote my own code or read through a textbook, though I did read Michael Nielsen's guide to neural networks a while ago.) My subjective feeling was that the ELK writeup didn't have a lot of prerequisites - mostly just a very basic understanding of what deep learning is about, and a vague understanding of what a Bayes net is. But I can't be confident in that. (In particular, Bayes nets are generally only used to make examples concrete, and I was generally pretty fine to just go with my rough impression of what was going on; I sometimes found the more detailed appendices, with pseudocode and a Conway's Game of Life analogy, clearer than the Bayes net diagrams anyway.)

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-06T18:49:25.940Z · EA · GW

[Placeholder for How artistic ideas could get harder to find comments]

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-05T20:12:17.464Z · EA · GW

Comments for AI alignment research links will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on Forecasting transformative AI: the "biological anchors" method in a nutshell · 2022-01-04T22:25:58.963Z · EA · GW

My understanding is that "mixture of experts" essentially comes down to training multiple distinct models, and having some "meta" procedure for assigning problems (or pieces of problems) to them.

Since training expense grows with something like the square of model size, it's much more expensive to train one big model than N smaller models that are each 1/N as big (plus a procedure for choosing between the N smaller models).

A human brain is about 100x the "size" of a mouse brain. So for a metaphor, you can think of "mixture of experts" as though it's trying to use 100 "mouse brains" (all working together under one procedure, and referred to as a single model) in place of one "human brain." This should be a lot cheaper (see previous paragraph), and there are intuitive reasons we'd expect it to be less powerful as well (imagine trying to assign intellectual tasks to 100 mice in a way that mimics what a human can do).

Comment by Holden Karnofsky (HoldenKarnofsky) on Comments for shorter Cold Takes pieces · 2022-01-04T20:27:26.841Z · EA · GW

Comments for Where's Today's Beethoven? will go here.

Comment by Holden Karnofsky (HoldenKarnofsky) on Summary of history (empowerment and well-being lens) · 2022-01-03T19:57:56.822Z · EA · GW

I broadly agree that my summary has this issue. If there were causal stories I were confident in, I would try to include them in the summary; but in fact I feel very hazy on a lot of multiple-step causal stories about history, and have defaulted to leaving them out when I think the case is quite unclear. I'm sure this leaves my summary less informative than it would ideally be (and than it would be if I knew more about history and were more confident about some of these multiple-step causal stories).

Comment by Holden Karnofsky (HoldenKarnofsky) on Bayesian Mindset · 2022-01-03T19:57:34.194Z · EA · GW

I agree with what you say here, as a general matter. I'm not sure I want to make an edit, as I really do think there are some "bad" parts of myself that I'd prefer to "expose and downweight," but I agree that it's easy to get carried away with that sort of thing.