A quick and crude comparison of epidemiological expert forecasts versus Metaculus forecasts for COVID-19

post by Justin Otto · 2020-04-02T19:29:26.024Z · score: 9 (8 votes) · EA · GW · 7 comments

Katherine Milkman on Twitter notes how far off the epidemiological expert forecasts were in the linked sample:


They gave an average estimate of 20,000 cases. The actual outcome was 122,653 by the stated date in the U.S. That's off by a factor of 6.13.

I was curious how this compares to the Metaculus community forecast (note: not the machine learning fed one, just the simple median prediction). Unfortunately the interface doesn't tell me the full distribution at date x, it just says what the median was at the time. If the expert central tendency was off by a factor of 6.13, how far off was it for Metaculus?

I looked into it in this document:


Sadly a direct comparison is not really feasible, since we weren't predicting the same questions. But suppose if all predictions of importance were inputted into platforms such as the Good Judgement Project Open or Metaculus. Then making comparisons between groups could be trivial and continuous. This isn't even "experts versus non-experts". The relevant comparison is at the platform-level. It is "untrackable and unworkable one-off PDFs of somebody's projections" versus proper scoring and aggregation over time. Since Metaculus accounts can be entirely anonymous, why wouldn't we want every expert to input their forecast into a track record? That would make it possible to find out if the person is a dart-throwing chimp. You should assume half of them are.


Comments sorted by top scores.

comment by Benjamin_Todd · 2020-04-02T23:05:32.175Z · score: 7 (4 votes) · EA(p) · GW(p)

There have been some claims that the 538 article put the wrong date on the expert's forecasts, and we haven't been able to figure out whether that's true or not by contacting them, so unfortunately I wouldn't use the 538 article by itself.

comment by Justin Otto · 2020-04-03T00:04:52.907Z · score: 6 (4 votes) · EA(p) · GW(p)

If I'm reading this Tweet thread correctly, anna wiederkehr from 538 seems to say the graphic was correct and the error really is that much.


It is further implied by them in another tweet:


Hey @katy_milkman
you seemed to have missed our follow up (and we'll do another this week) that shows a different picture in case estimates from the surveyed experts. It takes time to understand what's happening.
comment by Khorton · 2020-07-02T19:48:47.834Z · score: 5 (3 votes) · EA(p) · GW(p)

It's worth noting that epidemiologists can do modelling of disease given particular assumptions or policy choices, but they do not make a career out of predicting policies, which is a lot of what this question is about.

comment by Linch · 2020-07-13T10:42:37.065Z · score: 18 (5 votes) · EA(p) · GW(p)

I messaged Khorton on Twitter, but just paraphrasing what I said here:

I think this description is incomplete. By ~late April/May, in direct comparisons, generalist predictors on Metaculus and GJ Open generally outpredicted epidemiologist's surveys for forecasts 1 week out. This is relevant because there was no way for changes due to new gov't interventions or behavioral changes in the future to show up in case statistics in less than 7 days.
It takes several days for infections to get symptoms, a few days for people to get tested, and another X days (decreases as pandemic progressed) for tests to resolve.
So a question like "how many cases/deaths in 6 days" can almost be entirely decomposed to your distributions on
- how many infected *now*,
- how testing will change in a week
- probability of data updates.
The latter two does incorporate some elements of politics and behaviorial change. However, "amateur forecasters better at predicting political trends" can't be the full story, because in a lot of weeks data updates/testing didn't change *that much*, so amateur predictors essentially had a better internal model of how many ppl currently infected.
There are a lot of caveats, of course.
comment by Linch · 2020-05-10T05:10:23.894Z · score: 3 (2 votes) · EA(p) · GW(p)

Metaculus predictions are now featured in those surveys (yay!) so I was able to make a more direct comparison for the first survey where you can compare those predictions head-to-head.

tl;dr: Experts have broadly outperformed the Metaculus aggregative predictions, however the differences were not exceptionally large.

comment by Linch · 2020-07-02T11:32:06.948Z · score: 8 (5 votes) · EA(p) · GW(p)

UPDATE: With more data, Metaculus users have instead done better again.

comment by Justin Otto · 2020-07-02T15:30:09.092Z · score: 1 (1 votes) · EA(p) · GW(p)

Thank you very much! Apologies for not replying to your earlier comment. I was predicting that the Metaculus community prediction would outperform the surveys, and it is gratifying to see.