Posts
Comments
I'd add another benefit that I've not seen in the other answers: deciding on the curriculum and facilitating yourself get you to engage (critically) with a lot with EA material. Especially for the former you have to think about the EA idea-space and work out a path through it all for fellows.
I helped create a fellowship curriculum (mostly a hybrid of two existing curricula iirc) before there were virtual programs or and this definitely got me more involved with EA. Of course, there may be a trade-off in quality.
I agree with what you say, though would note
(1) maybe doom should be disambiguated between "the short-lived simulation that I am in is turned of"-doom (which I can't really observe) and "the basement reality Earth I am in is turned into paperclips by an unaligned AGI"-type doom.
(2) conditioning on me being in at least one short-lived simulation, if the multiverse is sufficiently large and the simulation containing me is sufficiently 'lawful' then I may also expect there to be basement reality copies of me too. In this case, doom is implied for (what I would guess is) most exact copies of me.
Thanks for this post! I've been meaning to write something similar, and have glad you have :-)
I agree with your claim that most observers like us (who believe they are at the hinge of history) are in (short-lived) simulations. Brian Tomasik discusses how this marginally makes one value interventions with short-term effects.
In particular, if you think the simulations won't include other moral patients simulated to a high resolution (e.g. Tomasik suggests this may be the case for wild animals in remote places), you would instrumentally care less about their welfare (since when you act to increase their welfare, this may only have effects in basement reality as well as the more expensive simulations that do simulate such wild animals) . At the extreme is your suggestion, where you are the only person in the simulation and so you may act as a hedonist! Given some uncertainty over the distribution of "resolution of simulations", it seems likely that one should still act altruistically.
I disagree with the claim that if we do not pursue longtermism, then no simulations of observers like us will be created. For example, I think an Earth-originating unaligned AGI would still have instrumental reasons to run simulations of 21st century Earth. Further, alien civilizations may have interest to learn about other civilizations.
Under your assumptions, I don't think this is a Newcomb-like problem. I think CDT & EDT would agree on the decision,[1] which I think depends on the number of simulations and the degree to which the existence of a good longterm future hinges your decisions. Supposing humanity only survives if you act as a longtermist and simulations of you are only created if humanity survives, then you can't both act hedonistically and be in a simulation.
This tool is impressive, thanks! I like the framing you use of safety as a race against capabilities, though think don't really know what it would look like to have "solved " AGI safety 20 years before AGI. I also appreciate all the assumptions being listed at the end of the page.
Some minor notes
- the GitHub link in the webpage footer points to the wrong page
- I think two of the prompts "How likely is it to work?" and "How much do you speed it up?" would be made clearer if "it" was replaced by AGI safety (if that is what it is referring to).
Thanks for this post! I used to do some voluntary university community building, and some of your insights definitely ring true to me, particularly the Alice example - I'm worried that I might have been the sort of facilitator to not return to the assumptions in fellowships I've facilitated.
A small note:
Well, the most obvious place to look is the most recent Leader Forum, which gives the following talent gaps (in order):
This EA Leaders Forum was nearly 3 years ago, and so talent gaps have possibly changed. There was a Meta Coordination Forum last year run by CEA, but I haven't seen any similar write-ups. This doesn't seem to be an important crux for most of your points, but thought would be worth mentioning.
This definitely sounds like a better approach than mine, thanks for sharing! This will be useful for me for any future projects
Thanks for your questions and comments! I really appreciate someone reading through in such detail :-)
- What is the highest probability of encountering aliens in the next 1000 years according to reasonable choices once could make in your model?
SIA (with no simulations) gives the nearest and most numerous aliens.
My bullish prior (which has a priori has 80% credence in us not being alone) with SIA and the assumption that grabby aliens are hiding gives a median of ~ chance in a grabby civilization reaching us in the next 1000 years.
I don't condition on us not having any ICs in our past light cone. When conditioning on not being inside a GC, SIA is pretty confident (~80% certain) that we have at least one IC (origin planet) in our past light cone. When conditioning on not seeing any GCs, SIA thinks ~50% that there's at least one IC in our past light cone. Even if there origin planet is in our light cone, they may already be dead.
- Sometimes you just give a prior, e.g., your prior on d, where I don't really know where it comes from. If it wouldn't take too much time, it might be worth it to quickly motivate them (e.g., "I think that any interval between x and y would be reasonable because of such and such, and I fitted a lognormal". It's possible I'm just missing something obvious to those familiar with the literature.
Thanks for the suggestion, this was definitely an oversight. I'll add in some text to motivate each prior.
- My prior for , the sum of delay and fuse steps: by definition it is bounded above by the time until now and bounded below by zero.
- I set the median to ~0.5 Gy. The median is both to account for the potential delay in the Earth first becoming habitable (since the range of estimates around the first life appearing is ~600 My) and be roughly in line with estimates for the time that plants took to oxygenate the atmosphere (a potential delay/fuse step) .
- My prior, , roughly fits these criteria
- My prior for is pretty arbitrarily chosen. Here's a post-hoc (motivated) semi-justification for the prior. Wikipedia discusses ~8 possible factors for Rare Earths. If there are necessary Rare-Earth like factors for life, each with fraction of planets having the property, then my prior on isn't awfully off.

- If one thinks that between 0.1 and 1 fraction of all planets have each of the eight factors (and they are independent) something roughly similar to my prior distribution follows.
- My prior for , the early universe habitability factor was mostly chosen arbitrarily. My prior implies a median time of ~10 Gy for the universe to be 50% habitable (i.e. the earliest time when habitable planets are in fact habitable due to the absence of gamma ray bursts). In hindsight, I'd probably choose a prior for u that implied a smaller median.

- My prior for , the fraction of ICs that become GCs:
- It is bounded below by 0.01, mostly to improve the Monte Carlo reliability in cases where smalleris greatly preferred
- Has a median of ~0.5. A Twitter poll from Robin Hanson ran gave [I can't find the reference right now].
Lots of the priors aren't super well founded. Fortunately, if you think my bounds on each parameter is reasonable, I get the same conclusions when taking a joint prior that is uniform on and log-uniform in all other parameters.
- Do you think your conclusion (e.g., around likelihood of observing GCs) would change significantly if "non-terrestrial" planets were habitable?
Good question. In a hack-y and unsatisfactory way, my model does allow for this:
If the ratio of non-terrestrial (habitable) planets to terrestrial (habitable) planets is , they replace the product of try-once steps with to account for the extra planets. (My prior on is bounded above by 1, but this could be easily changed). This approach would also suppose that non-terrestrial planets had the same distribution of habitable lifetimes as terrestrial ones.
Having said that, I don't think a better approach would change the results for the SIA and ADT updates. For SSA, the habitability of non-terrestrial planets makes civs like us more atypical (since we are on a terrestrial planet). If this atypicality applies equally in worlds with many GCs and worlds with very few GCs, then I doubt it would change the results. All the anthropic theories would update strongly against the habitability of non-terrestrial planets.
Typos:
Thanks!
Great to see this work!
Thanks!
Re the SIA Doomsday argument, I think that is self-undermining for reasons I've argued elsewhere.
I agree. When I model the existence of simulations like us, SIA does not imply doom (as seen in the marginalised posteriors for in the appendix here).
Further, the simulation case, SIA would prefer human civilization to be atypically likely to become a grabby civilization (this does not happen in my model as I suppose all civs have the same transition chance to become grabby).
Re the habitability of planets, I would not just model that as lifetimes, but would also consider variations in habitability/energy throughput at a given time
...
Smaller stars may have longer habitable windows but also smaller values for V and M. This sort of consideration limits the plausibility of red dwarf stars being dominant, and also allows for more smearing out of ICs over stars with different lifetimes as both positive and negative factors can get taken to the same power.
I'd definitely like to see this included in future models (I'm surprised Hanson didn't write about this in his Loud aliens paper). My intuition is that this changes little for the conclusions of SIA or anthropic decision theory with total utilitarianism, and that this weakens the case for many aliens for SSA, since our atypicality (or earliness) is decreased if we expect habitable planets around longer lived stars to have smaller volumes and/or lower metabolisms.
I'd also add, per Snyder-Beattie, catastrophes as a factor affecting probability of the emergence of life and affecting times of IC emergence.
I hadn't seen this before, thanks for sharing! I've skimmed through and found it interesting, though I'm suspicious that at times it uses SSA -with reference class of observers on planets as habitable as long as Earth - type reasoning.
Thanks, glad to hear it!
I wrote it in Google Docs, primarily for the ease of getting comments. I then copied it into the EA Forum editor and spent a few hours fixing the formatting - all the maths had to be rewritten, all footnote added back in, tables fixed, image captions added - which was a bit of a hassle.
I sadly don't have any neat tricks. I tried this Google Docs tool to convert to Markdown but it didn't work well.
The EA Forum editor now have the ability to share drafts and allow comments and collaborative editing, which I think I'll try for my next project. I'm also hoping Google Docs will add a better maths editor.
This looks great, thanks for creating it! I could see it becoming a great 'default' place for EAs to meet for coworking or social things.
+1
I think Hanson et al. mention something like this too
Thanks! I've considered it but have not decided whether I will. I'm unsure whether the decision relevant parts (which I see as most important) or weirder stuff (like simulations) would need to be cut.
Thanks for this post! I hadn't heard of Dysonian SETI before.
I'm wondering what your thoughts are on how one would promote Dysonian SETI? On the margin is this just scaling back existing 'active' SETI? Beyond attempts at xenoarchaeology in our solar system (which I think are practically certain to not turn up anything) I'm wondering else is in this space
A side note: this idea reminds me of the plot of the Mass Effect games!
The link to your post isn't working for me
A diagram to show possible definitions of existential risks (x-risks) and suffering risks (s-risks)
The (expected) value & disvalue of the entire world’s past and future can be placed on the below axes (assuming both are finite).

By these definitions:
- Some x-risks are s-risks
- Not all s-risks are x-risks
I find the framing of "experience slices" definitely pushes my intuitions in the same direction.
One question I like to think about is whether I'd choose to gain either
(a) a neutral experience
or
(b) flipping a coin and reliving all the positive experience slices of my life if heads, and reliving all the negative ones if tails
My life feels highly net positive but I'd almost certainly not take option (b). I'd guess there's likely risk aversion intuition also being snuck here too though.
Thanks for the post!
I'd recommend Daniel Kestenholz's energy log post for a system and template for tracking energy throughout the day.
From 1. "the same ballpark as murder" the Internet Archive has it saved here
The link in 3 "in the same ballpark as walking past a child drowning in a shallow pond" is also dead, but is in the Internet archive here
Edit: the link in 2 is also archived here
Not 128kb (Slack resized it for me) but this worked for me

Both links to Catalyst are broken (I think they're missing https://)
I really liked this post and made me think! Here are some stray thoughts which I'm not super confident in:
- Something similar to Linear Tolerance and No Significant Tolerance are called negative-leaning utilitarianism (or weak negative utilitarianism) and lexical-threshold negative utilitarianism (see here or here)
- It seems like logarithmic trade-offs are just linear tolerance where we've scaled (exponentially) all original suffering values . I'm not sure if it's just easier just to think the suffering values were already this value and then use linear tolerance?
- I'm confused by your use of and for amounts of suffering and happiness for an individual. I'm guessing you're also factoring in intensity?
The blogger gwern has many posts on self-experiments here.
Thanks for such a detailed and insightful response Gregory.
Your archetypal classical utilitarian is also committed to the OC as 'large increase in suffering for one individual' can be outweighed by a large enough number of smaller decreases in suffering for others - aggregation still applies to negative numbers for classical utilitarians. So the negative view fares better as the classical one has to bite one extra bullet.
Thanks for pointing this out. I think I realised this extra bullet biting after making the post.
There's also the worry in a pairwise comparison one might inadvertently pick a counterexample for one 'side' that turns the screws less than the counterexample for the other one.
This makes a lot of sense, and not something I’d considered at all and seems pretty important when playing counterxample-intuition-tennis.
By my lights, it seems better to have some procedure for picking and comparing cases which isolates the principle being evaluated. Ideally, the putative counterexamples share counterintuitive features both theories endorse, but differ in one is trying to explore the worst case that can be constructed which the principle would avoid, whilst the other the worst case that can be constructed with its inclusion.
Again, this feels really useful and something I want to think about further.
The typical worry of the (absolute) negative view itself is it fails to price happiness at all - yet often we're inclined to say enduring some suffering (or accepting some risk of suffering) is a good deal at least at some extreme of 'upside'.
I think my slight negative intuition comes from that fact that although I may be willing to endure some suffering for some upside, I wouldn’t endorse inflicting suffering (or risk or suffering) on person A for some upside for person B. I don't know how much work the differences of fairness personal identity (i.e. the being that suffered gets the upside) between the examples are doing, and it what direction my intuition is 'less' biased.
Yet with this procedure, we can construct a much worse counterexample to the negative view than the OC - by my lights, far more intuitively toxic than the already costly vRC. (Owed to Carl Shulman). Suppose A is a vast but trivially-imperfect utopia - Trillions (or googleplexes, or TREE(TREE(3))) lives lives of all-but-perfect bliss, but for each enduring an episode of trivial discomfort or suffering (e.g. a pin-prick, waiting a queue for an hour). Suppose Z is a world with a (relatively) much smaller number of people (e.g. a billion) living like the child in Omelas
I like this example a lot! and definitely lean A > Z.
Reframing the situation, and my intuition becomes less clear: considering A’, in which TREE(TREE(3))) lives are in perfect bliss, but there are also TREE(TREE(3))) beings that monetarily experience a single pinprick before ceasing to to exist. This is clearly equivalent to A in the axiology but my intuition is less clear (if at all) that A’ > Z. As above, I’m unsure how much work personal identity is doing. In my mind, I find population ethics easier to think about by considering ‘experienced moments’ rather than individuals.
(This axiology is also anti-egalitarian (consider replacing half the people in A with half the people in Z) ...
Thanks for pointing out the error. I think think I’m right in saying that the ‘welfare capped by 0’ axiology is non-anti-egalitarian, which I conflated with absolute NU in my post (which is anti-egalitarian as you say). The axiologies are much more distinct than I originally thought.
Suppose you think only suffering counts* (absolute negative utilitarian), then the 'negative totalism' population axiology seems pretty reasonable to me.
The axiology does entail the 'Omela Conclusion' (OC), an analogue of the Repugnant Conclusion (RC), which states that for any state of affairs there is a better state in which a single life is hellish and everyone else's life is free from suffering. As a form of totalism, the axiology does not lead to an analogue of the sadistic conclusion and is non-anti-egalitarian.
The OC (supposing absolute negative utilitarianism) seems more palatable to me than the RC (supposing classical utilitarianism). I'm curious to what extent, if at all, this intuition is shared.
Further, given a (debatable) meta-intuition for robustness of one's ethical theory, does such a preference suggest one should update slightly towards absolute negative utilitarianism or vice versa?
*or that individual utility is bounded above by 0
Hello! I'm a maths master's student at Cambridge and have been involved with student groups for the last few years. I've been lurking on the forum for a long time and want to become more active. Hopefully this is the first comment of many!