Red-teaming Holden Karnofsky's AI timelines

post by Vasco Grilo (vascoamaralgrilo), Simon Holm · 2022-06-25T14:24:42.910Z · EA · GW · 2 comments

Contents

  Summary
    Author contributions
    Acknowledgements
  Introduction
  1. Prediction
    Interpretation
    Inference
      Methodology
      Results and discussion
    Representativeness
  2. Reviewers of the technical reports
  3. Information hazards
None
2 comments

Summary

This is a red teaming [? · GW] exercise on Holden Karnofsky's AI Timelines: Where the Arguments, and the "Experts," Stand [EA · GW] (henceforth designated “HK's AI Timelines”), completed in the context of the Red Team Challenge by Training for Good[1]

Our key conclusions are:

Our key recommendations are:

We welcome comments on our key conclusions and recommendations, as well as on reasoning transparency, strength of arguments, and red-teaming efforts.

Author contributions

The contributions by author are as follows:

Acknowledgements

Thanks to:

Introduction

We have analysed Holden Karnofsky's blog post AI Timelines: Where the Arguments, and the "Experts," Stand [EA · GW] with the goal of constructively criticising Holden's claims and the way they were communicated. In particular, we investigated:

The key motivations for red-teaming this particular article are:

1. Prediction

Holden Karnofsky estimates that:

There is more than a 10% chance we'll see transformative AI within 15 years (by 2036); a ~50% chance we'll see it within 40 years (by 2060); and a ~2/3 chance we'll see it this century (by 2100).

Karnofsky bases his forecast on a number of technical reports, and we analysed it by answering the following:

The following sections deal with each of these questions. However, for the interpretation and inference, only 3 of the 9 in-depth pieces presented in the “one-table summary” of “HK's AI Timelines” are studied:

These seem to be the only in-depth pieces that provide quantitative forecasts for the year by which TAI will be seen, which facilitates comparisons. Nevertheless, they do not cover all the evidence based on which Holden Karnofsky's forecasts were made.

Interpretation

Are the technical reports being accurately interpreted?

We interpreted the numerical predictions made by the technical reports to be essentially in agreement with those made in “HK's AI Timelines”. 

Our interpretation of the forecasts for the probability of TAI given in the aforementioned reports (see the tab “AI Timelines predictions” of this Sheets), together with the one presented in the “one-table summary” of “HK's AI Timelines”, is provided in the table below.

ReportInterpretation
“HK's AI Timelines”Us
AI experts[5]

~20 % by 2036.

~50 % by 2060.

~70 % by 2100.

25 % by 2036.

49 % by 2060.

69 % by 2100.

Bio anchors[6]

> 10 % probability by 2036. 

~ 50 % chance by 2055.

~ 80 % chance by 2100.

18 % by 2036.

50 % by 2050.

80 % by 2100.

SIP

8% by 2036.

13% by 2060.

20% by 2100.

8 % by 2036.

18 % by 2100.

For all the forecasts, our interpretation is in agreement with that of “HK's AI Timelines” (when rounded to one significant digit).

However, it is worth noting the extent to which the “most aggressive” and “most conservative” estimates of “Bio anchors” differ from the respective “best guesses”[7] (see Part 4 of the report). This is illustrated in the table below (footnotes 8-11 are all citations from Ajeya Cotra).

Probability of TAI by the year…Conservative estimateBest guessAggressive estimate
2036[8]2%18%45%
2100[9]60%80%90%
Median forecast2090[10]20502040[11]

Indeed, the uncertainty of “Bio anchors” is acknowledged by Holden Karnofsky here [EA · GW].

There is also the question of the comparability of the differing definitions of transformative AI in the different technical reports, and if Holden's interpretation of these justifies his overall estimate. We mostly agree with Holden's claim in one of the footnotes of “HK's AI Timelines” that:

In general, all of these [the reports’ predicted] probabilities refer to something at least as capable as PASTA, so they directionally should be underestimates of the probability of PASTA (though I don't think this is a major issue).[12]

Regarding the first part of the above quote, the aforementioned probabilities refer:

A more concrete definition of TAI in “HK's AI Timelines” would have been useful to understand the extent to which its predictions are comparable with those of other sources. 

Moreover, in the second half of the quotation above, Holden claims that he does not think it a “major issue” that the predicted probabilities should be “underestimates of the probability of PASTA”. We think a justification for this would be valuable, especially if the timelines for PASTA are materially shorter than those for TAI as defined in the technical reports (which could potentially be a major issue).

Inference

Is the forecast consistent with the interpretations of the technical reports?

We found Holden Karnofsky’s estimate to be consistent with our interpretation of the technical reports, even when accounting for the uncertainty of the forecasts of the individual reports.

Methodology

The inference depends not only on the point estimates of the technical reports (see Interpretation [EA(p) · GW(p)]), but also on their uncertainty. Having this in mind, probability distributions representing the year by which TAI will be seen were fitted to the forecasts regarding our interpretation of the technical reports[14] (rows 2-4 and 7-9 in the table below). Moreover, “mean” and “aggregated” distributions which take into account all the three reports were calculated as follows:

The data points relative to the forecasts for 2036 and 2100 were used to estimate the parameters of such distributions. Estimates for the probability of TAI by these years are provided in the three reports and “HK's AI Timelines”, which enables consistency. The parameters of the derived distributions are presented in the tab “Derived distributions parameters”.

Results and discussion

The forecasts for the probability of TAI by 2036, 2060 and 2100 are presented in the table below. In addition, values for all the years from 2025 to 2100 are given in the tab “Derived distributions CDFs” for all the derived distributions.

DistributionProbability that TAI will be seen (%) by…
203620602100
1. “HK’s AI timelines”> 105067
2. AI experts lognormal254169
3. Bio anchors lognormal184080
4. SIP lognormal81118
5. Aggregated lognormal82777
6. Mean lognormal193974
7. AI experts loguniform254269
8. Bio anchors loguniform184180
9. SIP loguniform81218
10. Aggregated loguniform184180
11. Mean loguniform194074
Range8 - 2511 - 4218 - 80

The forecasts of “HK's AI Timelines” are aligned with those of the derived distributions[16]. These predict that the probability of TAI is:

Nevertheless, we think “HK's AI Timelines” would benefit from including an explanation about how Holden’s forecasts were derived from the sources of the “one-table summary”. For example, explicitly stating the weight given to each of the sources mentioned in the “one-table summary” (quantitatively or qualitatively).

Representativeness

Are the technical reports representative of the best available evidence?

We think there may be further sources Holden Karnofsky could have considered for his base of evidence to be more representative, but that it seemingly strikes a good balance between being representative and succinct. 

6 of the 9 pieces linked in the “one-table summary” of “HK's AI Timelines” regard analyses from Open Philanthropy. This is noted in “HK's AI Timelines”[17], and could reflect:

These are valid reasons, but it would arguably be beneficial to include/consider other sources. For example:

We did not, however, look into whether the conclusions of these publications would significantly update Holden's claims.

It would also be interesting to know whether:

That being said:

All in all, we essentially agree with the interpretations of the technical reports, and think Holden Karnofsky’s predictions could justifiably be inferred from their results. In addition, the sources which informed the predictions appear representative of the best available evidence. Consequently, the forecasts for TAI of “HK's AI Timelines” seem reasonable. 

2. Reviewers of the technical reports

We have not analysed the reviews of the technical reports from Open Philanthropy referred by Holden Karnofsky. However, their reviewers are seemingly credible. Brief descriptions are presented below:

For transparency, it seems worth mentioning the reasons for Past AI Forecasts not having been reviewed.

3. Information hazards

In the context of how to act in the absence of a robust expert consensus, it is argued in “HK's AI Timelines” that the “most important century” hypothesis should be taken seriously until and unless a “field of AI forecasting” develops, based on what is known now. The following reasons are presented:

Even if the above points are true, AI forecasting [? · GW] could be an information hazard [? · GW]. As noted in Forecasting AI progress: a research agenda from Ross Gruetzemacher et al., “high probability forecasts of short timelines to human-level AI might reduce investment in safety as actors scramble to deploy it first to gain a decisive strategic advantage”[19] (see Superintelligence from Nick Bostrom). 

That being said, the forecasts of “HK's AI Timelines” seem more likely to lengthen than to shorten AI timelines[20]. On the one hand, it could be argued that they are shorter than those of most citizens. On the other hand:

  1. ^

    We have not analysed in detail other posts from Cold Takes (Holden’s blog) related to AI forecasting. However, I (Vasco) read The Most Important Century [? · GW] in its entirety.

  2. ^

    “By “transformative AI”, I [Holden Karnofsky] mean “AI powerful enough to bring us into a new, qualitatively different future”. I specifically focus on what I'm calling PASTA: AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement. I've argued that advanced AI could be sufficient to make this the most important century, via the potential for a productivity explosion as well as risks from misaligned AI”.

  3. ^

    For “Bio anchors”, see Part 1, section “Definitions for key abstractions used in the model”.

  4. ^

    AI timelines is an instance of AI forecasting [? · GW] which includes predicting when human-level AI will emerge.

  5. ^

    Forecasts for “high level machine intelligence (all human tasks)” taken from analysing Fig. 1 (see tab “AI experts”).

  6. ^

    The TAI forecasts are provided in Part 4 of the Bio anchors report.

  7. ^

    Note that Holden mentions here [EA · GW] that: “Overall, my best guesses about transformative AI timelines are similar to those of Bio Anchors”.

  8. ^

    “I think a very broad range, from ~2% to ~45%, could potentially be defensible”.

  9. ^

    “Ultimately, I could see myself arriving at a view that assigns anywhere from ~60% to ~90% probability that TAI is developed this century; this view is even more tentative and subject to revision than my view about median TAI timelines. My best guess right now is about 80%”.

  10. ^

    “~2090 for my “most conservative plausible median”.

  11. ^

    “My “most aggressive plausible median” is ~2040”.

  12. ^

    PASTA qualifies as “transformative AI”, since it is an “AI powerful enough to bring us into a new, qualitatively different future”.

  13. ^

    Such growth rate is predicted to coincide with the emergence of AGI according to this Metaculus question. As of 25 June 2022, the community prediction for the time between the world real gross domestic product being 25% higher than every previous year and the development of artificial general intelligence (AGI) was one month, hence supporting Ajeya Cotra’s definition (although we are wary of inverse causation). 

  14. ^

    The approach followed to determine the parameters of the fitted distributions is explained here [EA · GW].

  15. ^

    Here, “mean” is written in italic whenever it refers to the mean of the logarithm. Likewise for other statistics.

  16. ^

    This method is primarily used as a sense check (i.e. “Is Karnofsky’s estimate reasonable?”), and is not intended to precisely quantify deviations.

  17. ^

    “For transparency, note that many of the technical reports are Open Philanthropy analyses, and I am co-CEO of Open Philanthropy”.

  18. ^

    This was subsequently added by Holden (in this footnote) to address a key recommendation of a previous version of this analysis: “mentioning the reviews of Past AI Forecasts, or the reasons for it not having been reviewed”.

  19. ^
  20. ^

    Both the prediction and realisation of TAI.

  21. ^

    These respect Ray Kurzweil, Eliezer Yudkowsky, and Bryan Caplan.

2 comments

Comments sorted by top scores.

comment by Peter Wildeford (Peter_Hurford) · 2022-06-25T17:44:53.854Z · EA(p) · GW(p)

Thanks for putting this together! I think more scrutiny on these ideas is incredibly important so I'm delighted to see you approach it.

So meta to red team a red team, but some things I want to comment on:

  • Your median estimate for the conservative and aggressive bioanchor reports in your table are accidentally flipped (2090 is the conservative median, not the aggressive one - and vice versa for 2040).

  • Looking literally at Cotra's sheet the median year occurs is 2053. Though in Cotra's report, you're right that she rounds this to 2050 and reports this as her official median year. So I think the only differences between your interpretation and Holden's interpretation is just different rounding.

  • I do agree more precise definitions would be helpful.

  • I don't think it makes sense to deviate from Cotra's best guess and create a mean out of aggregating between the conservative and aggressive estimates. We shouldn't assume these estimates are symmetric where the mean lies in the middle using some aggregation method, instead I think we should take Cotra's report literally where the mean of the distribution is where she says it is (it is her distribution to define how she wants), which would be the "best guess". In particular, her aggressive vs. conservative range does not represent any sort of formal confidence interval so we can't interpret it that way. I have some unpublished work where I re-run a version of Cotra's model where the variables are defined by formal confidence intervals - I think that would be the next step for this analysis.

  • The "Representativeness" section is very interesting and I'd love to see more timelines analyzed concretely and included in aggregations. For more reviews and analysis that include AI timelines, you should also look to "Reviews of “Is power-seeking AI an existential risk?”" [LW · GW]. I also liked this LessWrong thread where multiple people stated their timelines [LW · GW].

Replies from: vascoamaralgrilo
comment by Vasco Grilo (vascoamaralgrilo) · 2022-06-25T20:10:50.273Z · EA(p) · GW(p)

Thanks for commenting, Peter!

Your median estimate for the conservative and aggressive bioanchor reports in your table are accidentally flipped (2090 is the conservative median, not the aggressive one - and vice versa for 2040).

Corrected, thanks!

I don't think it makes sense to deviate from Cotra's best guess and create a mean out of aggregating between the conservative and aggressive estimates.

I agree. (Note the distribution we fitted to "Bio anchors" (row 4 of the 1st table of this [EA · GW] section) only relies on Cotra's "best guesses" for the probability of TAI by 2036 (18 %) and 2100 (80 %).)

The "Representativeness" section is very interesting and I'd love to see more timelines analyzed concretely and included in aggregations.

Thanks for the sources! Regarding the aggregation of forecasts, I thought this article to be quite interesting.