A quick and crude comparison of epidemiological expert forecasts versus Metaculus forecasts for COVID-19

post by Justin Otto · 2020-04-02T19:29:26.024Z · score: 9 (8 votes) · EA · GW · 3 comments

Katherine Milkman on Twitter notes how far off the epidemiological expert forecasts were in the linked sample:


They gave an average estimate of 20,000 cases. The actual outcome was 122,653 by the stated date in the U.S. That's off by a factor of 6.13.

I was curious how this compares to the Metaculus community forecast (note: not the machine learning fed one, just the simple median prediction). Unfortunately the interface doesn't tell me the full distribution at date x, it just says what the median was at the time. If the expert central tendency was off by a factor of 6.13, how far off was it for Metaculus?

I looked into it in this document:


Sadly a direct comparison is not really feasible, since we weren't predicting the same questions. But suppose if all predictions of importance were inputted into platforms such as the Good Judgement Project Open or Metaculus. Then making comparisons between groups could be trivial and continuous. This isn't even "experts versus non-experts". The relevant comparison is at the platform-level. It is "untrackable and unworkable one-off PDFs of somebody's projections" versus proper scoring and aggregation over time. Since Metaculus accounts can be entirely anonymous, why wouldn't we want every expert to input their forecast into a track record? That would make it possible to find out if the person is a dart-throwing chimp. You should assume half of them are.


Comments sorted by top scores.

comment by Benjamin_Todd · 2020-04-02T23:05:32.175Z · score: 7 (4 votes) · EA(p) · GW(p)

There have been some claims that the 538 article put the wrong date on the expert's forecasts, and we haven't been able to figure out whether that's true or not by contacting them, so unfortunately I wouldn't use the 538 article by itself.

comment by Justin Otto · 2020-04-03T00:04:52.907Z · score: 6 (4 votes) · EA(p) · GW(p)

If I'm reading this Tweet thread correctly, anna wiederkehr from 538 seems to say the graphic was correct and the error really is that much.


It is further implied by them in another tweet:


Hey @katy_milkman
you seemed to have missed our follow up (and we'll do another this week) that shows a different picture in case estimates from the surveyed experts. It takes time to understand what's happening.
comment by Linch · 2020-05-10T05:10:23.894Z · score: 2 (1 votes) · EA(p) · GW(p)

Metaculus predictions are now featured in those surveys (yay!) so I was able to make a more direct comparison for the first survey where you can compare those predictions head-to-head.

tl;dr: Experts have broadly outperformed the Metaculus aggregative predictions, however the differences were not exceptionally large.