(I work at Open Phil, speaking for myself)
FWIW, I think this could also make a lot of sense. I don't think Holden would be an individual contributor writing code forever, but skilling up in ML and completing concrete research projects seems like a good foundation for ultimately building a team doing something in AI safety.
I'm really sorry that you and so many others have this experience in the EA community. I don't have anything particularly helpful or insightful to say -- the way you're feeling is understandable, and it really sucks :(
I just wanted to say I'm flattered and grateful that you found some inspiration in that intro talk I gave. These days I'm working on pretty esoteric things, and can feel unmoored from the simple and powerful motivations which brought me here in the first place -- it's touching and encouraging to get some evidence that I've had a tangible impact on people.
Congratulations Jeff! Very exciting
I can give a sense of my investment, though I'm obviously an unusual case in multiple ways. I'm a coauthor on the report but I'm not an ARC researcher, and my role as a coauthor was primarily to try to make it more likely that the report would be accessible to a broader audience, which involved making sure my own "dumb questions" were answered in the report.
I kept time logs, and the whole project of coauthoring the report took me ~100 hours. By the end I had one "seed" of an ELK idea but unfortunately didn't flesh it out because other work/life things were pretty hectic. Getting to this "seed" was <30 min of investment.
I think if I had started with the report in hand, it would have taken me ~10 hours to read it carefully enough and ask enough "dumb questions" to get to the point of having the seed of an idea about as good as that idea, and then another ~10 hours to flesh it out into an algorithm + counterexample. I think the probability I'd have won the $5000 prize after that investment is ~50%, making the expected investment 40h. I think there's a non-trivial but not super high chance I'd have won larger prizes, so the $ / hours ratio is significantly better in expectation than $125/hr (since the ceiling for the larger prizes is so much higher).
- I have a fairly technical background, though I think the right way to categorize me is as "semi-technical" or "technical-literate." I did computer science in undergrad and enjoyed it / did well, but my day to day work mainly involves writing. I can do simple Python scripting. I can slowly and painfully sometimes do the kinds of algorithms problem sets I did quickly in undergrad.
- Four years ago I wrote this to explain what I understood of Paul's research agenda at the time.
- I've been thinking about AI alignment a lot over the last year, and especially have the unfair advantage of getting to talk to Paul a lot. With that said, I didn't really know much or think much about ELK specifically (which I consider pretty self-contained) until I started writing the report, which was late Nov / early Dec.
ARC would be excited for you to send a short email to email@example.com with a few bullet points describing your high level ideas, if you want to get a sense for whether you're on the right track / whether fleshing them out would be likely to win a prize.
I was imagining Sycophants as an outer alignment failure, assuming the model is trained with naive RL from human feedback.
Not intended to be expressing a significantly shorter timeline; 15-30 years was supposed to be a range of "plausible/significant probability" which the previous model also said (probability on 15 years was >10% and probability on 30 years was 50%). Sorry that wasn't clear!
(JTBC I think you could train a brain-sized model sooner than my median estimate for TAI, because you could train it on shorter horizon tasks.)
Ah yeah, that makes sense -- I agree that a lot of the reason for low commercialization is local optima, and also agree that there are lots of cool/fun applications that are left undone right now.
To clarify, we are planning to seek more feedback from people outside the EA community on our views about TAI timelines, but we're seeing that as a separate project from this report (and may gather feedback from outside the EA community without necessarily publicizing the report more widely).
Finally, have you talked much to people outside the alignment/effective altruism communities about your report? How have reactions varied by background? Are you reluctant to publish work like this broadly? If so, why? Do you see risks of increasing awareness of these issues pushing unsafe capabilities work?
I haven't engaged much with people outside the EA and AI alignment communities, and I'd guess that very few people outside these communities have heard about the report. I don't personally feel sold that the risks of publishing this type of analysis more broadly (in terms of potentially increasing capabilities work) outweigh the benefits of helping people better understand what to expect with AI and giving us a better chance of figuring out if our views are wrong. However, some other people in the AI risk reduction community who we consulted (TBC, not my manager or Open Phil as an institution) were more concerned about this, and I respect their judgment, so I chose to publish the draft report on LessWrong and avoid doing things that could result in it being shared much more widely, especially in a "low-bandwidth" way (e.g. just the "headline graph" being shared on social media).
Thanks! I'll answer your cluster of questions about takeoff speeds and commercialization in this comment and leave another comment respond to your questions about sharing my report outside the EA community.
Broadly speaking, I do expect that transformative AI will be foreshadowed by incremental economic gains; I generally expect gradual takeoff , meaning I would bet that at some point growth will be ~10% per year before it hits 30% per year (which was the arbitrary cut-off for "transformative" used in my report). I don't think it's necessarily the case; I just think it'll probably work this way. On the outside view, that's how most technologies seem to have worked. And on the inside view, it seems like there are lots of valuable-but-not-transformative applications of existing models on the horizon, and industry giants + startups are already on the move trying to capitalize.
My views imply a roughly ~10% probability that the compute to train transformative AI would be affordable in 10 years or less, which wouldn't really leave time for this kind of gradual takeoff. One reason it's a pretty low number is because it would imply sudden takeoff and I'm skeptical of that implication (though it's not the only reason -- I think there are separate reasons to be skeptical of the Lifetime Anchor and the Short Horizon Neural Network anchor, which drive short timelines in my model).
I don't expect that several generations of more powerful successors to GPT-3 will be developed before we see significant commercial applications to GPT-3; I expect commercialization of existing models and scaleup to larger models to be happening in parallel. There are already various applications online, e.g. AI Dungeon (based on GPT-3), TabNine (based on GPT-2), and this list of other apps. I don't think that evidence OpenAI was productizing GPT-3 would shift my timelines much either way, since I already expect them to be investing pretty heavily in this.
Relative to the present, I expect the machine learning industry to invest a larger share of its resources going forward into commercialization, as opposed to pure R&D: before this point a lot of the models studied in an R&D setting just weren't very useful (with the major exception of vision models underlying self-driving cars), and now they're starting to be pretty useful. But at least over the next 5-10 years I don't think that would slow down scaling / R&D much in an absolute sense, since the industry as a whole will probably grow, and there will be more resources for both scaling R&D and commercialization.
I haven't thought very deeply about this, but my first intuition is that the most compelling reason to expect to have an impact that predictably lasts longer than several hundred years without being washed out is because of the possibility of some sort of "lock-in" -- technology that allows values and preferences to be more stably transmitted into the very long-term future than current technology allows. For example, the ability to program space probes with instructions for creating the type of "digital life" we would morally value, with error-correcting measures to prevent drift, would count as a technology that allows for effective lock-in in my mind.
A lot of people may act as if we can't impact anything post-transformative AI because they believe technology that enables lock-in will be built very close in time after transformative AI (since TAI would likely cause R&D towards these types of tech to be greatly accelerated).
- I think "major insights" is potentially a somewhat loaded framing; it seems to imply that only highly conceptual considerations that change our minds about previously-accepted big picture claims count as significant progress. I think very early on, EA produced a number of somewhat arguments and considerations which felt like "major insights" in that they caused major swings in the consensus of what cause areas to prioritize at a very high level; I think that probably reflected that the question was relatively new and there was low-hanging fruit. I think we shouldn't expect future progress to take the form of "major insights" that wildly swing views about a basic, high-level question as much (although I still think that's possible).
- Since 2015, I think we've seen good analysis and discussion of AI timelines and takeoff speeds, discussion of specific AI risks that go beyond the classic scenario presented in Superintellilgence, better characterization of multipolar and distributed AI scenarios, some interesting and more quantitative debates on giving now vs giving later and "hinge of history" vs "patient" long-termism, etc. None of these have provided definitive / authoritative answers, but they all feel useful to me as someone trying to prioritize where Open Phil dollars should go.
- I'm not sure how to answer this; I think taking into account the expected low-hanging fruit effect, and the relatively low investment in this research, progress has probably been pretty good, but I'm very uncertain about the degree of progress I "should have expected" on priors.
- I think ideally the world as a whole would be investing much more in this type of work than it is now. A lot of the bottleneck to this is that the work is not very well-scoped or broken into tractable sub-problems, which makes it hard for a large number of people to be quickly on-boarded to it.
- Related to the above, I'd love for the work to become better-scoped over time -- this is one thing we prioritize highly at Open Phil.
My answer to this one is going to be a pretty boring "it depends" unfortunately. I was speaking to my own experience in responding to the top level question, and since I do a pretty "generalist"-y job, improving at general reasoning is likely to be more important for me. At least when restricting to areas that seem highly promising from a long-termist perspective, I think questions of personal fit and comparative advantage will end up determining the degree to which someone should be specialized in a particular topic like machine learning or biology.
I also think that often someone who is a generalist in terms of topic areas still specializes in a certain kind of methodology, e.g. researchers at Open Phil will often do "back of the envelope calculations" (BOTECs) in several different domains, effective "specializing" in the BOTEC skillset.
Yes, I meant that the version of long-termism we think about at Open Phil is animal-inclusive.
Personally, I don't do much explicit, dedicated practice or learning of either general reasoning skills (like forecasts) or content knowledge (like Anki decks); virtually all of my development on these axes comes from "just doing my job." However, I don't feel strongly that this is how everyone should be -- I've just found that this sort of explicit practice holds my attention less and subjectively feels like a less rewarding and efficient way to learn, so I don't invest in it much. I know lots of folks who feel differently, and do things like Anki decks, forecasting practice, or both.
My approach to thinking about algorithmic progress has been to try to extrapolate the rate of past progress forward; I rely on two sources for this, a paper by Katja Grace and a paper by Danny Hernandez and Tom Brown. One question I'd think about when forming a view on this is whether arguments like the ones you make should lead you to expect algorithmic progress to be significantly faster than the trendline, or whether those considerations are already "priced in" to the existing trendline.
Thanks, I'm glad you enjoyed it!
- I haven't put a lot of energy into thinking about personal implications, and don't have very worked-out views right now.
- I don't have a citation off the top of my head for fairness agreements specifically, but they're closely related to "variance normalization" approaches to moral uncertainty, which are described here (that blog post links to a few papers).
I'm pretty excited about economic modeling-based approaches, either:
- Estimating the value-added from machine learning historically and extrapolating it into the future, or
- Doing a takeoff analysis that takes into account how AI progress relates to inputs such as hardware and software effort, and the extent to which AI of a certain quality level can allow hardware to substitute for software effort, similar to the "Intelligence Explosion Microeconomics" paper.
Most people at Open Phil aren't 100% bought into to utilitarianism, but utilitarian thinking has an outsized impact on cause selection and prioritization because under a lot of other ethical perspectives, philanthropy is supererogatory, so those other ethical perspectives are not as "opinionated" about how best to do philanthropy. It seems that the non-utilitarian perspectives we take most seriously usually don't provide explicit cause prioritization input such as "Fund biosecurity rather than farm animal welfare", but rather provide input about what rules or constraints we should be operating under, such as "Don't misrepresent what you believe even if it would increase expected impact in utilitarian terms."
- I agree that your prior would need to have an infinite expectation for the size of the universe for this argument to go through.
- I agree with the generalized statement that your prior over "value-I-can-affect" needs to have an infinite expectation, but I don't think I agree with the operationalization of "value-I-can-affect" as V/n. It seems possible to me that even if there are a high density of value-maximizing civilizations out there, each one could have an infinite impact through e.g. acausal trade. I'm not sure what a crisp operationalization of "value-I-can-affect" would be.
Thanks, I'm glad you liked it so much!
- I reasonably often do things like make models in Python, but the actual coding is a pretty small part of my work -- something like 5%-10% of my time. I've never done a coding project for work that was more complicated than the notebook accompanying my timelines report, and most models I make are considerably simpler (usually implemented in spreadsheets rather than in code).
- I'm not familiar with the public interest tech movement unfortunately, so I'm not sure what I think about that research project idea.
I don't have an easily-summarizable worldview that ties together the different parts of my life. In my career, effective altruism (something like "Try to do as much good as possible, and think deeply about what that means and be open to counterintuitive answers") is definitely dominant. In my personal life, I try to be "agenty" about getting what I want, and to be open to trying unusually hard or being "weird" when that's what works for me and makes me happy. I think these are both evolving a lot in the specifics.
In some sense I agree with gwern that the reason ML hasn't generated a lot of value is because people haven't put in the work (both coding and otherwise) needed to roll it out to different domains, but (I think unlike gwern) the main inference I make from that that it wouldn't have been hugely profitable to put in the work to create ML-based applications (or else more people would have been diverted from other coding tasks to the task of rolling out ML applications).
Thanks Michael! I agree space colonization may not be strictly required for achieving a stable state of low x-risk, but because it's the "canonical" vision of the stable low-risk future, I would feel significantly more uncertain if we were to rule out the possibility of expansion into space, and I would be inclined to be skeptical-by-default, particularly if we are picturing biological humans, because it seems like there are a large number of possible ways the environmental conditions needed for survival might be destroyed and it intuitively seems like "offense" would have an advantage over "defense" there. But I haven't thought deeply about the technology that would be needed to preserve a state of low x-risk entirely on Earth and I'd expect my views would change a lot with only a few hours of thinking on this.
There aren't $10B worth of giving opportunities that I'd be excited about supporting now, for essentially the same reasons why Open Phil isn't giving everything away over the next few years. Basically, we expect (and I agree) that there will be more, better giving opportunities in the medium-term future and so it makes sense to save the marginal dollar for future giving, at least right now. There would likely be some differences between what I would fund and what Open Phil is currently funding due to different intuitions about the most promising interventions to investigate with scarce capacity, but I don't expect them to be large.
I'm not very familiar with these open source implementations; they seem interesting! So far, I haven't explicitly broken out different possible sources of algorithmic progress in my model, since I'm thinking about in a very zoomed-out way (extrapolating big-picture quantitative trends in algorithmic progress). I'm not sure how much of the progress captured in these trends comes from traditional industry/academia sources vs open source projects like these.
Thanks, I'm glad you found that explanation helpful!
I think I broadly agree with you that SIA is somewhat less "suspicious" than SSA, with the small caveat that I think most of the weirdness can be preserved with a finite-but-sufficienty-giant world rather than a literally infinite world.
Thanks, I'm glad you enjoyed it!
- This is fairly basic, but EA community building is definitely another cause I'd add to that list. I'm less confident in other potential areas, but I would also be curious about exploring some aspects of improving institutional decision-making as well.
- The decision to open a hiring round is usually made at the level of individual focus areas and sub-teams, and we don't have an organization-wide growth plan, so it's fairly difficult to estimate exact numbers; with that said, I expect we'll be doing some hiring of both generalists and program specialists over the next few years. (We have a new open positions page here.)
I would love to see more stories of this form, and think that writing stories like this is a good area of research to be pursuing for its own sake that could help inform strategy at Open Phil and elsewhere. With that said, I don't think I'd advise everyone who is trying to do technical AI alignment to determine what questions they're going to pursue based on an exercise like this -- doing this can be very laborious, and the technical research route it makes the most sense for you to pursue will probably be affected by a lot of considerations not captured in the exercise, such as your existing background, your native research intuitions and aesthetic (which can often determine what approaches you'll be able to find any purchase on), what mentorship opportunities you have available to you and what your potential mentors are interested in, etc.
In my work, I've gotten better at resisting the urge to investigate sub-questions more deeply and instead pulling back and trying to find short-cuts to answering the high-level question. In my personal life, I've gotten better at setting up my schedule so I'm having fun in the evenings and weekends instead of mindlessly browsing social media. (I have a long way to go on both of these though.)
Also, I got a university degree :)
I don't think I have a satisfying general answer to this question; in practice, the approaches I pursue first are heavily influenced by which approaches I happen to find some purchase on, since many theoretically appealing reference classes or high-level approaches to the question may be difficult to make progress on for whatever reason.
Decisions about the size of the basic science budget are made within the "near-termist" worldview bucket, since we see the primary case for this funding as the potential for scientific breakthroughs to improve health and welfare over the next several decades; I'm not involved with that since my research focus is on cause prioritization within the "long-termist" worldview.
In terms of high-level principles, the decision would be made by comparing an estimate of the value of marginal science funding against an estimate of the value of the near-termist "last dollar", but I'm not familiar with the specific numbers myself.
Like Linch says, some of the reason the Metaculus median is lower than mine is probably because they have a weaker definition; 2035 seems like a reasonable median for "fully general AI" as they define it, and my best guess may even be sooner.
With that said, I've definitely had a number of conversations with people who have shorter timelines than me for truly transformative AI; Daniel Kokotajlo articulates a view in this space here. Disagreements tend to be around the following points:
- People with shorter timelines than me tend to feel that the notion of "effective horizon length" either doesn't make sense, or that training time scales sub-linearly rather than linearly with effective horizon length, or that models with short effective horizon lengths will be transformative despite being "myopic." They generally prefer a model where a scaled-up GPT-3 constitutes transformative AI. Since I published my draft report, Guille Costa (an intern at Open Philanthropy) released a version of the model that explicitly breaks out "scaled up GPT-3" as a hypothesis, which would imply a median of 2040 if all my other assumptions are kept intact.
- They also tend to feel that extrapolations of when existing model architectures will reach human-level performance on certain benchmarks, e.g. a recently-created multitask language learning benchmark, implies that "human-level" capability would be reached at ~1e13 or 1e14 FLOP/subj sec rather than ~1e16 FLOP/subj sec as I guessed in my report. I'm more skeptical of extrapolation from benchmarks because my guess is that the benchmarks we have right now were selected to be hard-but-doable for our current generation of models, and once models start doing extremely well at these benchmarks we will likely generate harder benchmarks with more work, and there may be multiple rounds of this process.
- They tend to be less skeptical on priors of sudden takeoff, which leads them to put less weight on considerations like "If transformative AI is going to be developed in only 5-10 years, why aren't we seeing much more economic impacts from AI today?"
- Some of them also feel that I underestimate the algorithmic progress that will be made over the next ~5 years: they may not disagree with my characterization of the current scaling behavior of ML systems, but they place more weight than I would on an influx of researchers (potentially working with ML-enabled tools) making new discoveries that shift us to a different scaling regime, e.g. one more like the "Lifetime Anchor" hypothesis.
- Finally, some people with shorter timelines than me tend to expect that rollout of AI technologies will be faster and smoother than I do, and expect there to be less delay from "working out kinks" or making systems robust enough to deploy.
We don't have firmly-articulated "worldview divisions" beyond the three laid out in that post, though as I mention towards the end of this section in my podcast, different giving opportunities within a particular worldview can perform differently on important but hard-to-quantify axes such as the strength of feedback loops, the risk of self-delusion, or the extent to which it feels like a "Pascal's mugging", and these types of considerations can affect how much we give to particular opportunities.
I'd say that a "cause" is something analogous to an academic field (like "machine learning theory" or "marine biology") or an industry (like "car manufacturing" or "corporate law"), organized around a problem or opportunity to improve the world. The motivating problem or opportunity needs to be specific enough and clear enough that it pays off to specialize in it by developing particular skills, reading up on a body of work related to the problem, trying to join particular organizations that also work on the problem, etc.
Like fields and industries, the boundaries around what exactly a "cause" is can be fuzzy, and a cause can have sub-causes (e.g. "marine biology" is a sub-field of "biology" and "car manufacturing" is a sub-industry within "manufacturing"). But some things are clearly too broad to be a cause: "doing good" is not a cause in the same way that "learning stuff" is not an academic field and "making money" is not an industry. Right now, the cause areas that long-termist EAs support are in their infancy, so they're pretty broad and "generalist"; over time I expect sub-causes to become more clearly defined and deeper specialized expertise to develop within them (e.g. I think it's fairly recently that most people in the community started thinking of "AI governance and policy" as a distinct sub-cause within "AI risk reduction").
Both within Open Phil and outside it, I think "cause prioritization" is a type of intellectual inquiry trying to figure out how many resources (often money but sometimes time / human resources) we would want going into different causes within some set, given some normative assumptions (e.g. utilitarianism of some kind).
- The thing I most love about my work is my relationships with my coworkers and manager; they are all deeply thoughtful, perceptive, and compassionate people who help me improve along lots of dimensions.
- Like I discussed in the podcast, a demoralizing aspect of my work is that we're often pursuing questions were deeply satisfying answers are functionally impossible and it's extremely unclear when something is "done." It's easy to spend much longer on a project than you hoped, and to feel that you put in a lot of work to end up with an answer that's still hopelessly subjective and extremely easy to disagree with.
- I think I would do significantly better in my role if I were less sensitive about the possibility that someone (especially experts or fancy people) would think I'm dumb for missing some consideration, not having an excellent response to an objection, not knowing everything about a technical sub-topic, making a mistake, etc. It would allow me to make better judgment calls about when it's actually worth digging into something more, and to write more freely without getting bogged down in figuring out exactly how to caveat something.
- I think the most important thing I did before joining Open Phil was to follow GiveWell's research closely and to attempt to digest EA concepts well enough to teach them to others; I think this helped me notice when there was a job opportunity at GiveWell and to perform well in the interview process. Once at Open Phil, I think it was good that I asked a lot of questions about everything and pretty consistently said yes to opportunities to work on something harder than what I had done before.
- I'd say that we're interested in all three of preventing outright extinction, preventing some other kind of existential catastrophe, and in trajectory changes such as moving probability mass from "okay" worlds to "very good" worlds; I would expect some non-trivial fraction of our impact to come from all of those channels. However, I'm unsure how much weight each of these scenarios should get -- that depends on various complicated empirical and philosophical questions we haven't fully investigated (e.g. "What is the probability civilization would recover from collapse of various types?" and "How morally valuable should we think it is if the culture which arises after a recovery from collapse is very different from our current culture, and that culture is the one which gets to determine the long-term future?"). In practice our grantmaking isn't making fine-grained distinctions between these or premised on one particular channel of impact: biosecurity and pandemic preparedness grantmaking may help prevent both outright extinction and civilizational collapse scenarios, AI alignment grantmaking may help prevent outright extinction or help make an "ok" future into a "great" one, etc.
- I'd say that long-termism as a view is inherently animal-inclusive (just as the animal-inclusive view inherently also cares about humans); the view places weight on humans and animals today, and humans / animals / other types of moral patients in the distant future. Often the fact that it's animal-inclusive is less salient though, because it is concerned with the potential for creating large numbers of thriving digital minds in the future, which we often picture as more human-like than animal-like.
- I think the total view on population ethics is one important route to long-termism but others are possible. For example, you could be very uncertain what you value, but reason that it would be easier to figure out what we value and realize our values if we are safer, wiser, and have access to more resources.
I'm most interested in forecasting work that could help us figure out how much to prioritize AI risk over other x-risks, for example estimating transformative AI timelines, trying to characterize what the world would look like in between now and transformative AI, and trying to estimate the magnitude of risk from AI.
It occurred to me that another way to try to move someone on complicated category 3 disagreements might be to put together a well-constructed survey of a population that the person is inclined to defer to. This approach is definitely still tricky: you'd have to convince the person that the relevant population was provided with the strongest arguments for that person's view in addition to your counterarguments, and that the individuals surveyed were thinking about it reasonably hard. But if done well, it could be pretty powerful.
I'm afraid I don't have crisp enough models of the simulation hypothesis and related sub-questions to have a top n list. My biggest question is something more like "This seems like a pretty fishy argument, and I find myself not fully getting or buying it despite not being able to write down a simple flaw. What's up with that? Can somebody explain away my intuition that it's fishy in a more satisfying way and convince me to buy it more wholeheartedly, or else can someone pinpoint the fishiness more precisely?" My second biggest question is something like "Does this actually have any actionable implications for altruists/philanthropists? What are they, and can you justify them in a way that feels more robust and concrete and satisfying than earlier attempts, like Robin Hanson's How to Live in a Simulation?"
Hm, I think I'd say progress at this stage largely looks like being better able to cash out disagreements about big-picture and long-term questions in terms of disagreements about more narrow, empirical, or near-term questions, and then trying to further break down and ultimately answer these sub-questions to try to figure out which big picture view(s) are most correct. I think given the relatively small amount of effort put into it so far and the intrinsic difficulty of this project, returns have been pretty good on that front -- it feels like people are having somewhat narrower and more tractable arguments as time goes on.
I'm not sure about what exact skillsets the field most needs. I think the field right now is still in a very early stage and could use a lot of disentanglement research, and it's often pretty chaotic and contingent what "qualifies" someone for this kind of work. Deep familiarity with the existing discourse and previous arguments/attempts at disentanglement is often useful, and some sort of quantitative background (e.g. economics or computer science or math) or mindset is often useful, and subject matter expertise (in this case machine learning and AI more broadly) is often useful, but none of these things are obviously necessary or sufficient. Often it's just that someone happens to strike upon an approach to the question that has some purchase, they write it up on the EA Forum or LessWrong, and it strikes a chord with others and results in more progress along those lines.
In my reply to Linch, I said that most of my errors were probably in some sense "general reasoning" errors, and a lot of what I'm improving over the course of doing my job is general reasoning. But at the same time, I don't think that most EAs should spend a large fraction of their time doing things that look like explicitly practicing general reasoning in an isolated or artificial way (for example, re-reading the Sequences, studying probability theory, doing calibration training, etc). I think it's good to be spending most of your time trying to accomplish something straightforwardly valuable, which will often incidentally require building up some content expertise. It's just that a lot of the benefit of those things will probably come through improving your general skills.
The first time I really thought about TAI timelines was in 2016, when I read Holden's blog post. That got me to take the possibility of TAI soonish seriously for the first time (I hadn't been explicitly convinced of long timelines earlier or anything, I just hadn't thought about it).
Then I talked more with Holden and technical advisors over the next few years, and formed the impression that there was a relatively simple argument that many technical advisors believed that if a brain-sized model could be transformative, then there's a relatively tight argument implying it would take X FLOP to train it, which would become affordable in the next couple decades. That meant that if we had a moderate probability on the first premise, we should have a moderate probability on TAI in the next couple decades. This made me take short timelines even more seriously because I found the biological analogy intuitively appealing, and I didn't think that people who confidently disagreed had strong arguments against it.
Then I started digging into those arguments in mid-2019 for the project that ultimately became the report, and I started to be more skeptical again because it seemed that even conditional on assuming a brain-sized model would constitute TAI, there are many different hypotheses you could have about how much computation it would take to train it (what eventually became the biological anchors), and different technical advisors believed in different versions of this. In particular, it felt like the notion of a horizon length made sense and incorporating it into the argument(s) made timelines seem longer.
Then after writing up an earlier draft of the report, it felt like a number of people (including those who had longish timelines) felt that I was underweighting short and medium horizon lengths, which caused me to upweight those views some.
That's fair, and I do try to think about this sort of thing when choosing e.g. how wide to make my probability distributions and where to center them; I often make them wider than feels reasonable to me. I didn't mean to imply that I explicitly avoid incorporating such outside view considerations, just that returns to further thinking about them are often lower by their nature (since they're often about unkown-unkowns).
I primarily do research rather than grantmaking, but I can give my speculations about what grant opportunities people on the grantmaking side of the organization would be excited about. In general, I think it's exciting when there is an opportunity to fund a relatively senior person with a strong track record who can manage or mentor a number of earlier-career people, because that provides an opportunity for exponential growth in the pool of people who are working on these issues. For example, this could look like funding a new professor who is aligned with our priorities in a sub-area and wants to mentor students to work on problems we are excited about in that sub-area.
In terms of why more people and projects don't get funded: at least at Open Phil, grantmakers generally try not to evaluate large numbers of applications or inquiries from earlier-career people individually, because each evaluation can be fairly intensive but the grant size is often relatively small; grantmakers at Open Phil prefer to focus on investigations that could lead to larger grants. Open Phil does offer some scholarships for early career researchers (e.g. here and here), but in general we prefer that this sort of grantmaking be handled by organizations like EA Funds.
I think the inclusion of "in principle" makes the answer kind of boring -- when we're not thinking about practicality at all, I think I'd definitely prefer to know more facts (about e.g. the future of AI or what would happen in the world if we pursued strategy A vs strategy B) than to have better reasoning skills, but that's not a very interesting answer.
In practice, I'm usually investing a lot more in general reasoning, because I'm operating in a domain (AI forecasting and futurism more generally) where it's pretty expensive to collect new knowledge/facts, it's pretty difficult to figure out how to connect facts about the present to beliefs about the distant future, and facts you could gather in 2021 are fairy likely to be obsoleted by new developments in 2022. So I would say most of my importance-weighted errors are going to be in the general reasoning domain. I think it's fairy similar for most people at Open Phil, and most EAs trying to do global priorities research or cause prioritization, especially within long-termism. I think the more object-level your work is, the more likely it is that your big mistakes will involve being unaware of empirical details.
However, investing in general reasoning doesn't often look like "explicitly practicing general reasoning" (e.g. doing calibration training, studying probability theory or analytic philosophy, etc). It's usually incidental improvement that's happening over the course of a particular project (which will often involve developing plenty of content knowledge too).
I generally spend most of my energy looking for inside-view considerations that might be wrong, because they are more likely to suggest a particular directional update (although I'm not focused only on inside view arguments specifically from ML researchers, and place a lot of weight on inside view arguments from generalists too).
It's often hard to incorporate the most outside-view considerations into bottom line estimates, because it's not clear what their implication should be. For example, the outside-view argument "it's difficult to forecast the future and you should be very uncertain" may imply spreading probability out more widely, but that would involve assigning higher probabilities to TAI very soon, which is in tension with another outside view argument along the lines of "Predicting something extraordinary will happen very soon has a bad track record."
Murphyjitsu: Conditional on TAI being built in 2025, what happened? (i.e. how was it built, what parts of your model were wrong, what do the next 5 years look like, what do the 5 years after 2025 look like?)
On the object level, I think it would probably turn out to be the case that a) I was wrong about horizon length and something more like ~1 token was sufficient, b) I was wrong about model size and something more like ~10T parameter was sufficient. On a deeper level, it would mean I was wrong about the plausibility of ultra-sudden takeoff and shouldn't have placed as much weight as I did on the observation that AI isn't generating a lot of annual revenue right now and its value-added seems to have been increasing relatively smoothly so far.
I would guess that the model looks like a scaled-up predictive model (natural language and/or code), perhaps combined with simple planning or search. Maybe a coding model rapidly trains more-powerful successors in a pretty classically Bostromian / Yudkowskian way.
Since this is a pretty Bostromian scenario, and I haven't thought deeply about those scenarios, I would default to guessing that the world after looks fairly Bostromian, with risks involving the AI forcibly taking control of most of the world's resources, and the positive scenario involving cooperatively using the AI to prevent other x-risks (including risks from other AI projects).
Thanks so much, that's great to hear! I'll answer your first question in this comment and leave a separate reply for your Murphyjitsu question.
First of all, I definitely agree that the difference between 2050 and 2032 is a big deal and worth getting to the bottom of; it would make a difference to Open Phil's prioritization (and internally we're trying to do projects that could convince us of timelines significantly shorter than in my report). You may be right that it could have a counterintuitively small impact on many individual people's career choices, for the reasons you say, but I think many others (especially early career people) would and should change their actions substantially.
I think there are roughly three types of reasons why Bob might disagree with Alice about a bottom line conclusion like TAI timelines, which correspond to three types of research or discourse contributions Bob could make in this space:
1. Disagreements can come from Bob knowing more facts than Alice about a key parameter, which can allow Bob to make "straightforward corrections" to Alice's proposed value for that parameter. E.g., "You didn't think much about hardware, but I did a solid research project into hardware and I think experts would agree that because of optical computing progress will be faster than you assumed; changing to the better values makes timelines shorter." If Bob does a good enough job with this empirical investigation, Alice will often just say "Great, thanks!" and adopt Bob's number.
2. Disagreements can come from Bob modeling out a part of the world in more mechanistic detail that Alice fudged or simplified, which can allow Bob to propose a better structure than Alice's model. E.g., "You agree that earlier AI systems can generate revenue which can be reinvested into AI research but you didn't explicitly model that and just made a guess about spending trajectory; I'll show that accounting for this properly would make timelines shorter." Alice may feel some hesitance adopting Bob's model wholesale here, because Alice's model may fudge/elide one thing in an overly-conservative direction which she feels is counterbalanced by fudging/eliding another thing in an overly-aggressive direction, but it will often be tractable to argue that the new model is better and Alice will often be happy to adopt it (perhaps changing some other fudged parameters a little to preserve intuitions that seemed important to her).
3. Finally, disagreements can come from differences in intuition about the subjective weight different considerations should get when coming up with values for the more debatable parameters (such as the different biological anchor hypotheses). It's more difficult for Bob to make a contribution toward changing Alice's bottom line here, because a lot of the action is in hard-to-access mental alchemy going on in Alice and Bob's minds when they make difficult judgment calls. Bob can try to reframe things, offer intuition pumps, trace disagreements about one topic back to a deeper disagreement about another topic and argue about that, and so on, but he should expect it to be slow going and expect Alice to be pretty hard to move.
In my experience, most large and persistent disagreements between people about big-picture questions like TAI timelines or the magnitude of risk from AI are mostly the third kind of disagreement, and these disagreements can be entangled with dozens of other differences in background assumptions / outlook / worldview. My sense is that your most major disagreements with me fall into the third category: you think that I'm overweighting the hypothesis that we'd need to do meta-learning in which the "inner loop" takes a long subjective time; you may also think that I'm underweighting the possibility of sudden takeoff or overweighting the efficiency of markets in a certain way, which leads me to lend too much credence to considerations like "Well if the low end of the compute range is actually right, we should probably be seeing more economic impact from the slightly-smaller AI systems right now." If you were to change my mind on this, it might not even be from doing "timelines research": maybe you do "takeoff speeds research" that convinces me to take sudden takeoff more seriously, which in turn causes me to take shorter timelines (which would imply more sudden takeoff) more seriously.
I'd say tackling category 3 disagreements is high risk and effort but has the possibility of high reward, and tackling category 1 disagreements is lower risk and effort with more moderate reward. My subjective impression is that EAs tend to under-invest in tackling categories 1 and 2 because they perceive category 3 as where the real action is -- in some sense they're right about that, but they may underestimate how hard it'll be to change people's minds there. For example, changing someone's minds about a category 3 disagreement often greatly benefits from having a lot of face time with them, which isn't very scalable, and arguments may be more particular to individuals: what finally convinces Alice may not be moving to Charlie.
I think one potential way to get at a category 3 disagreement about a long-term forecast is by proposing bets about nearer-term forecasts, although I think this is often a lot harder than it sounds, because people are sensitive to the possibility of "losing on a technicality": they were right about the big picture but wrong about how that big picture actually translates to a near-term prediction. Even making short-term bets often benefits from having a lot of face time to hash out the terms.