Thanks for the thoughtful reply!
Beyond the instrumental convergence thesis, though, I do think that some bits of the classic arguments are awkward to fit onto concrete and plausible ML-based development scenarios: for example, the focus on recursive self-improvement, and the use of thought experiments in which natural language commands, when interpretted literally and single-mindedly, lead to unforeseen bad behaviors. I think that Reframing Superintelligence does a good job of pointing out some of the tensions between classic ways of thinking and talking about AI risk and current/plausible ML engineering practices.
Would you say that the treacherous turn argument can also be mapped over to contemporary ML methods (similarly to the instrumental convergence thesis) due to it being a fairly abstract principle?
Also, why is "recursive self-improvement" awkward to fit onto concrete and plausible ML-based development scenarios? (If we ignore the incorrect usage of the word "recursive" here; the concept should have been called "iterative self-improvement"). Consider the work that has been done on neural architecture search via reinforcement learning (a 2016 paper on that topic currently has 1,775 citations on Google Scholars, including 560 new citations from 2020). It doesn't seem extremely unlikely that such a technique will be used, at some point in the future, in some iterative self-improvement setup, in a way that may cause an existential catastrophe.
Regarding the example with the agent that creates the feed of each FB user:
the system wouldn't, in any meaningful sense, have long-run objectives (due to the shortness of sessions).
I agree that the specified time horizon (and discount factor) is important, and that a shorter time horizon seems safer. But note that FB in incentivized to specify a long time horizon. For example, suppose the feed-creation-agent shows a user a horrible post by some troll, which causes the user to spend many hours in a heated back-and-forth with said troll. Consequently, the user decides FB sucks and ends up getting off FB for many months. If the specified time horizon is sufficiently short (or the discount factor is sufficiently small), then from the perspective of the training process the agent did well when it showed the user that post, and the agent's policy network will be updated in a way that makes such decisions more likely. FB doesn't want that. FB's actual discount factor for users' engagement time may be very close to 1 (i.e. having a user spend an hour on FB today is not 100x more valuable to FB than having that user spend an hour on FB next month). This situation is not unique to FB. Many companies that use RL agents that act in the real world have long-term preferences with respect to how their RL agents act.
It also probably wouldn't have the ability or inclination to manipulate the external world in the pursuit of complex schemes.
Regarding the "inclination" part: Manipulating the "external world" (what other environment does the feed-creation-agent model?) in the pursuit of certain complex schemes is very useful for maximizing the user engagement metric (that by assumption corresponds to the specified reward function). I also don't see how the "wouldn't have the ability" part is justified in the limit as the amount of training compute (and architecture size) and data grows to infinity.
Figuring out how to manipulate the external world in precise ways would require a huge amount of very weird exploration, deep in a section of the space of possible policies where most of the policies are terrible at maximizing reward
We expect the training process to update the policy network in a way that makes the agent more intelligent (i.e. better at modeling the world and causal chains therein, better at planning, etc.), because that is useful for maximizing the sum of discounted rewards. So I don't understand how your above argument works, unless you're arguing that there's some upper bound on the level of intelligence that we can expect deep RL algorithms to yield, and that upper bound is below the minimum level for an agent to pose existential risk due to instrumental convergence.
in the unlikely event that the necessary exploration happened, and the policy started moving in this direction, I think it would be conspicuous before the newsfeed selection algorithm does something like kill everyone to prevent ongoing FB sessions from ending
We should expect a sufficiently intelligent unaligned agent to refrain from behaving in a way that is both unacceptable and conspicuous, as long as we can turn it off (that's the treacherous turn argument). The question is whether the agent will do something sufficiently alarming and conspicuous before the point where it is intelligent enough to realize it should not cause alarm. I don't think we can be very confident either way.aidangoth on The 80,000 Hours job board is the skeleton of effective altruism stripped of all misleading ideologies
An important difference between overall budgets and job boards is that budgets tell you how all the resources are spent whereas job boards just tell you how (some of) the resources are spent on the margin. EA could spend a lot of money on some area and/or employ lots of people to work in that area without actively hiring new people. We'd miss that by just looking at the job board.
I think this is a nice suggestion for getting a rough idea of EA priorities but because of this + Habryka's observation that the 80k job board is not representative of new jobs in and around EA, I'd caution against putting much weight on this.nunosempere on What are some low-information priors that you find practically useful for thinking about the world?
Yep, exactly right.habryka on What is the increase in expected value of effective altruist Wayne Hsiung being mayor of Berkeley instead of its current incumbent?
This was posted to a relatively large (> 100 people) but private FB groups where various people who were active in EA and animal activism were talking to each other. I can confirm that it is accurate (since I am still part of the group).linch on Politics on the EA Forum
Where does a post on vote-trading/vote-pairing [EA · GW] fit in? On the one hand, it's about electoral politics tactics rather than an object-level discussion of which candidates or political parties are better.
On the other hand, they're usually structured to implicitly or explicitly benefit some candidates at the cost of others.davidjanku on Donor Lottery Debrief
Effective Thesis [EA · GW] is looking for funding. I believe the downside risk is very small and we could likely find a way to ensure U.S. tax deductibility. Since this is small meta project, it's not that easy to find institutional support and individual donations might thus have quite large impact on continuation of this project.linch on What is the increase in expected value of effective altruist Wayne Hsiung being mayor of Berkeley instead of its current incumbent?
I can't think of many (any?) other EA leaders who want to become elected leaders.
While this was before contemporary EA, Peter Singer has run for office before:
In 1992, he became a founding member of the Victorian Greens. He has run for political office twice for the Greens: in 1994 he received 28% of the vote in the Kooyong by-election, and in 1996 he received 3% of the vote when running for the Senate (elected by proportional representation). Before the 1996 election, he co-authored a book The Greens with Bob Brown.
Of course, some of our even earlier predecessors, like the old school English utilitarians, or the Chinese Mohists, were substantially more interested in direct politics (rather than precursors to think-tank style policy analysis) than we are.aarongertler on What is the increase in expected value of effective altruist Wayne Hsiung being mayor of Berkeley instead of its current incumbent?
Where does this quote come from?aarongertler on avacyn's Shortform
That's the kind of source I was looking for; thanks for letting me know when it came up.bshumway on Addressing Global Poverty as a Strategy to Improve the Long-Term Future
Yes, the US pandemic response in particular is evidence that the wealth of a country does not seem to be the most important factor in effective response to threats. Also, the “boring apocalypse“ scenario seems much more probable to me than any sort of “bang” or rapid extinction event and I think there is a lot that could be done in the realm of global development to help create a world more robust to that kind of slow burn.