Posts
Comments
From the bullet list above, it sounds like the author will be the one responsible for publishing and publicising the work.
Those definitely help, thanks! Any additional answers are still useful and I don't want to discourage answers from people who haven't read the above. For example we may have learned some empirical things since these analyses came out.
I don't mean to imply that we'll build a sovereign AI (I doubt it too).
Corrigible is more what I meant. Corrigible but not necessarily limited. Ie minimally intent aligned AIs which won't kill you but by the strategy stealing assumption can still compete with unaligned AIs.
Re 1) this relates to the strategy stealing assumption: your aligned AI can use whatever strategy unaligned AIs use to maintain and grow their power. Killing the competition is one strategy but there are many others including defensive actions and earning money / resources.
Edit: I implicitly said that it's okay to have unaligned AIs as long as you have enough aligned ones around. For example we may not need aligned companies if we have (minimally) aligned government+law enforcement.
I agree that it's not trivial to assume everyone will use aligned AI.
Let's suppose the goal of alignment research is to make aligned AI equally easy/cheap to build as unaligned AI. I. e. no addition cost. If we then suppose aligned AI also has a nonzero benefit, people are incentivized to use it.
The above seems to be the perspective in this alignment research overview https://www.effectivealtruism.org/articles/paul-christiano-current-work-in-ai-alignment.
More ink could be spilled on whether aligning AI has a nonzero commercial benefit. I feel that efforts like prompting and Instruct GPT are suggestive. But this may not apply to all alignment efforts.
Another framing on this: As an academic, if I magically worked more productive hours this month, I could just do the high-priority research I otherwise would've done next week/month/year, so I wouldn't do lower-priority work.
Thanks Aidan, I'll consider this model when doing any more thinking on this.
It seems to depend on your job. E.g. in academia there's a practically endless stream of high priority research to do since each field is way too big for one person solve. Doing more work generates more ideas, which generate more work.
It seems to depend on your job. E.g. in academia there's an endless stream of high priority research to do. Doing more work generates more ideas, which generate more work.
It seems to depend on your job. E.g. in academia there's an endless stream of high priority research to do. Doing more work generates more ideas, which generate more work. There's also practically endless high priority things to learn.
CS professor Cal Newport says that if you can do DeepWork TM for 4h / day, you’re hitting the mental speed limit
and:
the next hour worked at the 10h/week mark might have 10x as much impact as the hour worked after the 100h/week mark
Thanks Hauke that's helpful. Yes, the above would be mainly because you run out of steam at 100h/week. I want to clarify that I assume this effect doesn't exist. I'm not talking about working 20% less and then relaxing. The 20% of time lost would also go into work, but that work has no benefit for career capital or impact.
Thanks Lukas that's helpful. Some thoughts on when you'd expect diminishing returns to work: Probably this happens when when you're in a job at a small-sized org or department where you have a limited amount to do. On the other hand, a sign that there's lots to do would be if your job requires more than one person (with roughly the same skills as you).
In this case here the career is academia or startup founder.
Acquire and repurpose new AI startups for AI safety
Artificial intelligence
As ML performance has recently improved there is a new wave of startups coming. Some are composed of top talent, carefully engineered infrastructure, a promising product, well-coordinated teams, with existing workflows and management capacity. All of these are bottlenecks for AI safety R&D.
It should be possible to acquire some appropriate startups and middle-sized companies. Examples include HuggingFace, AI21, Cohere, and smaller, newer startups. The idea is to repurpose the mission of some select companies to align them more closely with socially beneficial and safety-oriented R&D. This is sometimes feasible since their missions are often broad, still in flux, and their product could benefit from improving safety and alignment.
Trying this could have very high information value. If it works, it has enormous potential upside as many new AI startups are being created now that could be acquired in the future. It could potentially more than double the size of the AI alignment R&D.
Paying existing employees to do safety R&D seems easier than paying academics. Academics often like to follow their own ideas but employees are already doing what their superior tells them to. In fact, they may find alignment and safety R&D more motivating than their company's existing mission. Additionally, some founders may be more willing to sell to a non-profit org with a social-good mission than to Big Tech.
Big tech companies acquire small companies all the time. The reasons for this vary (e.g. killing competition), but overall it suggests that it can be feasible and even profitable.
Caveats:
1) A highly qualified replacement may be needed for the top-level management.
2) Some employees may leave after an acquisition. This seems more likely if the pivot towards safety is a big change to the skills and workflows. Or if the employees don't like the new mission. It seems possible to partially avoid both of these by acquiring the right companies and steering them towards a mission that is relevant to their existing work. For example, natural language generation startups would usually benefit from fine-tuning their models with alignment techniques.
I'm no expert in this area but I'm told that European think-tanks are often strapped for cash so that may explain why the funders get so much influence (which is promising for the funder of course but it may not generalize to the US).
IIRC Tristan Harris has also made this claim. Maybe his 80k podcast or The Social Dilemma has some clues.
Edit: maybe he just said something like 'Youtube's algorithm is trained to send users down rabbit hole'
Re why AI isn't generating much revenue - have you considered the productivity paradox? It's historically normal that productivity slows down before steeply increasing when a new general purpose technologies arrives.
See "Why Future Technological Progress Is Consistent with Low Current Productivity Growth" in "Artificial Intelligence and the Modern Productivity Paradox"
Instructions for that: http://www.eccentrictraining.com/6.html
That's really interesting, thanks! Do you (or someone else) have a sense of how much variation in priorities can be explained by the big 5?
Makes sense. I guess then the question is if the work of everyone except the x-risk focused NGOs helps reduce r x-risk much. I tend to think yes since much of pandemic preparedness also addresses the worst case scenarios. But that seems to be an open question.
Thanks, great analysis! Just registering that I still expect bio risk will be less neglected than in the past. The major consideration for me is institutional funding, due to its scale. Like you say:
We believe that an issue of the magnitude of COVID-19 will likely not be forgotten soon, and that funding for pandemic preparedness will likely be safe for much longer than in the aftermath of previous pandemics. In particular it may persist long enough to become institutionalised and therefore harder to cut.
Aside from future institutional funding, we also have to take the into account the current funding and new experience because they contribute to our cumulative knowledge and preparedness.
Important question, and nicely researched!
A caveat is that some essential subareas of safety may be neglected. This is not a problem when subareas substitute each other: e.g. debate substitutes for amplification so it's okay if one of them is neglected. But there's a problem when subareas complement each other: e.g. alignment complements robustness so we probably need to solve both. See also When causes multiply.
It's ok when a subarea is neglected as long as there's a substitute for it. But so far it seems that some areas are necessary components of AI safety (perhaps both inner and outer alignment are).
This was also discussed on LessWrong:
Kudos btw for writing this. Consciousness is a topic where it can be really hard to make progress and I worry that people aren't posting enough about it for fear of saying something wrong.
I agree that physical theories of consciousness are pan psychist if they say that every recurrent net is conscious (or that everything that can be described as GWT is conscious). The main caveats for me are:
Does anyone really claim that every recurrent net is conscious? It seems so implausible. E.g. if I initialize my net with random parameters, it just computes garbage. Or if I have a net with 1 parameter it seems too simple. Or if the number of iterations is 1 (as you say), it's just a trivial case of recurrence. Or if it doesn't do any interesting task, such as prediction...
(Also, most recurrent nets in nature would be gerrymandered. I could imagine there are enough that aren't though, such as potentially your examples).
NB, recurrence doesn't necessarily imply recurrent processing (the term from recurrent processing theory). The 'processing' part could hide a bunch of complexity?
I like your description of how complex physical processes like global attention / GWT to simple ones like feedforward nets.
But I don't see how this implies that e.g. GWT reduces to panpsychism. E.g. to describe a recurrent net as a feedforward net you need a ridiculous number of parameters (with the same parameter values in each layer). So that doesn't imply that the universe is full of recurrent nets (even if it were full of feedforward nets which it isn't).
To draw a caricature of your argument as I understand it: It turns out computers can be reduced to logic gates. Therefore, everything is a computer.
Or another caricature: Recurrent nets are a special case of {any arrangement of atoms}. Therefore any arrangement of atoms is an RNN.
edit: missing word
Your link goes to the UK version. Here's US:
Just as a data point, "eye clear" took off for the conference ICLR so people seem to find the "clear" pronunciation intuitive.
Thanks for writing this. I don't have a solution but I'm just registering that I would expect plenty of rejected applicants to feel alienated from the EA community despite this post.
It's just an informal way to say that we're probably typical observers. It's named after Copernicus because he found that the Earth isn't as special as people thought.
Very nice list!
Great work!!!
Hmmm isn't the argument still pretty broadly applicable and useful despite the exceptions?
If you want a single source, I find the 80000 hours key ideas page and everything it links to quite comprehensive and well written.
Like most commenters, I broadly agree with the empirical info here. It's sort of obvious, but telling others things like "don't go out of your way to use less plastic" or even just creating unnecessary waste in a social situation can be inconsiderate towards people's sensibilities. Of course, this post advocates no such thing but I want to be sure nobody goes away thinking these things are necessarily OK.
(I was recently reminded of a CEA research article about how considerateness is even more important than most people think, and EAs should be especially careful because their behavior reflects on the whole community.)
On second thoughts, I think it's worth clarifying that my claim is still true even though yours is important in its own right. On Gott's reasoning, P(high influence | world has 2^N times the # of people who've already lived) is still just 2^-N (that's 2^-(N-1) if summed over all k>=N). As you said, these tiny probabilities are balanced out by asymptotically infinite impact.
I'll write up a separate objection to that claim but first a clarifying question: Why do you call Gott's conditional probability a prior? Isn't it more of a likelihood? In my model it should be combined with a prior P(number of people the world has). The resulting posterior is then the prior for further enquiries.
Interesting point!
The diverging series seems to be a version of the St Petersburg paradox, which has fooled me before. In the original version, you have a 2^-k chance of winning 2^k for every positive integer k, which leads to infinite expected payoff. One way in which it's brittle is that, as you say, the payoff is quite limited if we have some upper bound on the size of the population. Two other mathematical ways are 1) if the payoff is just 1.99^k or 2) if it is 2^0.99k.
If you're just presenting a prior I agree that you've not conditioned on an observation "we're very early". But to the extent that your reasoning says there's a non-trivial probability of [we have extremely high influence over a big future], you do condition on some observation of that kind. In fact, it would seem weird if any Copernican prior could give non-trivial mass to that proposition without an additional observation.
I continue my response here because the rest is more suitable as a higher-level comment.
On your prior,
P(high influence) isn't tiny. But if I understand correctly, that's just because
P(high influence | short future) isn't tiny whereas
P(high influence | long future) is still tiny. (I haven't checked the math, correct me if I'm wrong).
So your argument doesn't seems to save existential risk work. The only way to get a non-trivial P(high influence | long future) with your prior seems to be by conditioning on an additional observation "we're extremely early". As I argued here, that's somewhat sketchy to do.
So your prior says, unlike Will’s, that there are non-trivial probabilities of very early lock-in. That seems plausible and important. But it seems to me that your analysis not only uses a different prior but also conditions on “we live extremely early” which I think is problematic.
Will argues that it’s very weird we seem to be at an extremely hingy time. So we should discount that possibility. You say that we’re living at an extremely early time and it’s not weird for early times to be hingy. I imagine Will’s response would be “it’s very weird we seem to be living at an extremely early time then” (and it’s doubly weird if it implies we live in an extremely hingy time).
If living at an early time implies something that is extremely unlikely a priori for a random person from the timeline, then there should be an explanation. These 3 explanations seem exhaustive:
1) We’re extremely lucky.
2) We aren’t actually early: E.g. we’re in a simulation or the future is short. (The latter doesn’t necessarily imply that xrisk work doesn’t have much impact because the future might just be short in terms of people in our anthropic reference class).
3) Early people don’t actually have outsized influence: E.g. the hazard/hinge rate in your model is low (perhaps 1/N where N is the length of the future). In a Bayesian graphical model, there should be a strong update in favor of low hinge rates after observing that we live very early (unless another explanation is likely a priori).
Both 2) and 3) seem somewhat plausible a priori so it seems we don’t need to assume that a big coincidence explains how early we live.
This sounds really cool. Will have to read properly later. How would you recommend a time pressured reader to go through this? Are you planning a summary?
Just registering that I'm not convinced this justifies the title.
Yep, see reply to Lukas.
Agreed, I was assuming that the prior for the simulation hypothesis isn't very low because people seem to put credence in it even before Will's argument.
But I found it worth noting that Will's inequality only follows from mine (the likelihood ratio) plus having a reasonably even prior odds ratio.
2.
For me, the HoH update is big enough to make a the simulation hypothesis a pretty likely explanation. It also makes it less likely that there are alternative explanations for "HoH seems likely". See my old post here (probably better to read this comment though).
Imagine a Bayesian model with a variable S="HoH seems likely" (to us) and 3 variables pointing towards it: "HoH" (prior: 0.001), "simulation" (prior=0.1), and "other wrong but convincing arguments" (prior=0.01). Note that it seems pretty unlikely there will be convincing but wrong arguments a priori (I used 0.01) because we haven't updated on the outside view yet.
Further, assume that all three causes, if true, are equally likely to cause "HoH seems likely" (say with probability 1, but the probability doesn't affect the posterior).
Apply Bayes rule: We've observed "HoH seems likely". The denominator in Bayes rule is P(HoH seems likely) ~~ 0.111 (roughly the sum of the three priors because the priors are small). The numerator for each hypothesis H equals 1 * P(H).
Bayes rule gives an equal update (ca 1/0.111x = 9x) in favor of every hypothesis, bringing up the probability of "simulation" to nearly 90%.
Note that this probability decreases if we find, or think there are better explanations for "HoH seems likely". This is plausible but not overwhelmingly likely because we already have a decent explanation with prior 0.1. If we didn't have one, we would still have a lot of pressure to explain "HoH seems likely". The existence of the plausible explanation "simulation" with prior 0.1 "explains away" the need for other explanations such as those falling under "wrong but convincing argument".
This is just an example, feel free to plug in your numbers, or critique the model.
Both seem true and relevant. You could in fact write P(seems like HoH | simulation) >> P(seems like HoH | not simulation), which leads to the other two via Bayes theorem.
Important post!
I like your simulation update against HoH. I was meaning to write a post about this. Brian Tomasik has a great paper that quantitatively models the ratio of our influence on the short vs long-term. Though you've linked it, I think it's worth highlighting it more.
How the Simulation Argument Dampens Future Fanaticism
The paper cleverly argues that the simulation argument combined with anthropics either strongly dampens the expected impact of far future altruism or strongly increases the impact of short-term altruism. That conclusion seems fairly robust to the choice of decision- and anthropic theory and uncertainty over some empirical parameters. He doesn't directly discuss how the "seems like HoH" observation affects his conclusions, but I think it makes them stronger. (i recommend Brian's simplified calculations here).
I assume this paper didn't get as much discussion as it deserves because Brian posted it in the dark days of LW.
That fair, I made a mathematical error there. The cluster headache math convinces me that a large chunk of total suffering goes to few people there due to lopsided frequencies. Do you have other examples? I particularly felt that the relative frequency of extreme compared to less extreme pain wasn't well supported.
Your 4 cluster headache groups contribute about equally to the total number of cluster headaches if you multiply group size by # of CH's. (The top 2% actually contribute a bit less). That's my entire point. I'm not sure if you disagree?
To the second half of your comment, I agree that extreme suffering can be very extreme and I think this is an important contribution. Maybe we have a misunderstanding about what 'the bulk' of suffering refers to. To me it means something like 75-99% and to you it means something like 45% as stated above? I should also clarify that by frequency I mean the product of 'how many people have it', 'how often' and 'for how long'.
"the people in the top 10% of sufferers will have 10X the amount, and people in the 99% [I assume you mean top 1%?] will have 100X the amount"
I'm confused, you seem to be suggesting that every level of pain accounts for the _same_ amount of total suffering here.
To elaborate, you seem to be saying that at any level of pain, 10x worse pain is also 10x less frequent. That's a power law with exponent 1. I.e. the levels of pain have an extreme distribution, but the frequencies do too (mild pains are extremely common). I'm not saying you're wrong - just that I've seen also seems consistent with extreme pain being less than 10% of the total. I'm excited to see more data :)
Aside from my concern about extreme pain being rarer than ordinary pain, I also would find the conclusion that
"...the bulk of suffering is concentrated in a small percentage of experiences..."
very surprising. Standard computational neuroscience decision-making views such as RL models would say that if this is true, animals would have to spend most of their everyday effort trying to avoid extreme pain. But that seems wrong. E. g. we seek food to relieve mild hunger and get a nice taste and not because we once had a an extreme hunger experience that we learned from.
You could argue that the learning from extreme pain doesn't track the subjective intensity of pain. But then people would be choosing e. g. a subjectively 10x worse pain over a <10x longer pain. In this cause I'd probably say that the subjective impression is misguided or ethically irrelevant, though that's an ethical judgment.