Annual AGI Benchmarking Event

  Background reading

Metaculus is strongly considering organizing an annual AGI benchmarking event. Once a year, we’d run a benchmark or suite of benchmarks against the most generally intelligent AI systems available to us at the time, seeking to assess their generality and the overall shape of their capabilities. We would publicize the event widely among the AI research, policy, and forecasting communities.


We think this might be a good idea for several reasons:


We're currently working on a plan, and are soliciting ideas and feedback from the community here. To guide the discussion, here are some properties we think the ideal benchmark should have. It would:

Once we’ve collected the strongest ideas and developed them into a cohesive whole, we will solicit feedback from the AI research community before publishing the final plan. Thanks for your contributions to the discussion – we look forward to reading and engaging with your ideas!

Here are a few resources to get you thinking.


An idea based on iteratively crowdsourcing adversarial questions

A discussion on AGI benchmarking


On the Measure of Intelligence

What we might look for in an AGI benchmark

General intelligence disentangled via a generality metric for natural and artificial intelligence


comment by Peter Wildeford (Peter_Hurford) · 2022-08-30T03:29:22.912Z · EA(p) · GW(p)

Thanks for soliciting public feedback on this. Unfortunately I'm worried that publicizing this could be net negative though I'm not very confident in this. My worry is that humans are good at making numbers go up and will be driven by highly publicized benchmarks to try to get higher scores, and thus this event would make capabilities go faster than they otherwise would, which would be bad.

I certainly realize it could be good to be able to more easily resolve Metaculus forecasts and also it could be helpful to get more insight into capabilities that might otherwise be hidden from the public, but my weakly held view and the view of at least three other people working at or associated with Rethink Priorities also feel the same (and also with weak confidence) but preferred their views to be anonymous for now.

Replies from: Lawrence Phillips
comment by Lawrence Phillips · 2022-08-31T15:50:32.789Z · EA(p) · GW(p)

Thank you for the feedback. This is an important and valid concern. Similar concerns were raised on the discussion thread over at Metaculus, and we've replied with some thoughts there. It's worth mentioning that I don't think we should move forward with anything until we've carefully considered the consequences – probably using forecasting to help with this – and gotten feedback from several disinterested parties.

I've thought a little more, at a very high level, about how an event like this might be designed in order to be beneficial overall, and written the idea up here.