Project: A web platform for crowdsourcing impact estimates of interventions.

post by Max Clarke (maxeonyx) · 2022-04-22T06:54:31.754Z · EA · GW · 18 comments

Contents

  Overview
    Estimates on important quantities
    Formal theory-of-change models
    In detail, here's what that the platform would have:
  Fake money or real money?
  What's required for an MVP?
  Ideas for why might this not work?
  Extra content post-edit
  Final words
None
18 comments

TL;DR: Create a web platform for computing live-updated impact estimates for different interventions, based on crowd-sourced estimates of assumptions and on explicit (simplified) theory-of-change causal models. Encourage founders, funders, skeptics and ordinary EAs to participate in these "markets".

Please tell me why you think this might not work!

Please tell me your ideas to make this even better!

Overview

Estimates on important quantities

Through wisdom-of-the-crowds (and ideally, actual prediction markets), we can get better estimates of numbers important for the EA movement. By improving the estimates of our assumptions, we can improve the effectiveness of the EA movement as a whole.

Here is an image of a prediction market interface, with the title "Estimated impact of Milestone 1. Reduction in number of chickens in battery cages in 2026 compared to counterfactual." It shows a user placing a gaussian distribution over top of the aggregate distribution of other's predictions.

Formal theory-of-change models

By formalizing (and sharing) theory-of-change models for interventions, we make them easier to critique and explain. 

If we use an actual (simple) programming language for these models, we can compute impact estimates in a standard way.

 

If we get the EA community to use a lot of these (and maybe weight people's predictions somehow?), then we get other benefits as well, such as the ability 

In addition to the direct benefits of better estimates, encouraging people to put numbers on their conversations and ideas encourages epistemic virtue in the EA community.

Also, having a database of impact estimates of interventions, lets us easily compare across cause areas.

In detail, here's what that the platform would have:

  1. The ability to create "estimates" on the site. All estimates allow bets to be posted as a critique/adjustment mechanism (market?). (Including the ability to input a probability distribution). Examples of kinds of numbers we would see are:
    1. "current number of chickens in battery farms"
    2. "expected QALYs of a future with a correctly aligned AI"
    3. "chance of AGI risk from
    4. "% reduction in 'AGI risk scenario B' given success of this intervention"
  2. A visual scripting  language or simple programming environment for creating simple causal models that describe a theory of change. While creating the model, the user might also new estimates (like above), seeded with the creator's prediction.
    1. e.g. the creator would estimate "person-hours required to reach milestone 1 of this intervention", which would be posted on the site as a thing for other users to bet on.
  3. Allow comments to be posted attached to people estimates, to give feedback to the creators.
    1. e.g. "I'm placing this estimate for these reasons, and I'll change it after we discuss it. I think that your project requires many more person-hours than you have estimated to reach even your first milestone."
  4. Allow fixing the values of certain numbers and re-generating a visualization the resulting estimates of different interventions.
    1. e.g. "Here's a link to your intervention if we assume my estimate of the person-hours required to reach milestone 1. As we can see that kind of kills this as an intervention compared to just putting the person hours into <other scalable intervention here>"
  5. Have some way of sharing your reputation score, as determined by how the various aggregated guesses moved over time towards or away from your particular guesses.
    1. Sharing this score probably shouldn't be the default because it would be too tied up in people's self worth, but having a high score would be a good signal.

Fake money or real money?

My initial thought on this is to just let people make guesses on individual markets, but the lack of a currency or actual incentive might make some people skeptical that the aggregated results are meaningful.

However, there's a recent proposal that says it's legal to do prediction markets with real money, conditional on that money being donated to charity: "Predicting for Good: Charity Prediction Markets [EA · GW]" (akrolsmir [EA · GW], harsimony [EA · GW])

Using real money has some benefits, ie. that it definitely matches what's happening with the actual spending of the EA movement. People doing their donations via the site allows others to explicitly hedge against your donations if they think that's better in terms of impact.

I'm confused about how to make estimates using real money on random numbers like "% reduction in 'AGI risk scenario B' given success of this intervention"

What's required for an MVP?

The core valuable project is three things:

  1. A web interface for creating and estimating on numbers/assumptions.
    1. For some kinds of estimates, the interface needs to support 2D distributions.
    2. E.g. Probability of smarter-than-human AGI each year has 3 axes - chance, probability weight, and time. The distribution is a 2D surface in 3D.
  2. A intuitive (visual?) programming tool for creating causal models (probabilistic programs) using those numbers.
  3. A back-end for keeping the results up-to-date. This involves aggregating the guesses, and re-deriving the resulting facts by running the causal programs through a probabilistic programming engine (e.g. something that does markov chain monte carlo)
    1. Care needs to be taken to manage "time" in this system in order for results to be as live as possible but also correct.

In my experience of web development, this would probably take about one person-year to get a prototype to the point where it's both complete and pleasant enough for others to use. (Wouldn't it be nice if you could disagree on that number right here and now?!  )

Extensions to this project:

Ideas for why might this not work?

  1. Models for why interventions might or might not work could be too complicated to reasonably put into an explicit program.
    1. For example, they might require putting many hours in, or require a high programming skill.
    2. Or, they might require sampling from complex probability distributions that makes it too hard to use a probabilistic programming language for this purpose.
  2. Even if the aggregation works, and the site is intuitive, people might not use it.
    1. For example, people might not want to be critiqued in this way.
  3. It might take too much work to do.

Extra content post-edit

Here's a list of links and people I have found on this topic, sneaked in before I hopefully get to 25 upvotes and get onto the Nonlinear library feed.

The main concern from all of these efforts is that it's hard to get people to use this platform. I think that yes,it needs to be really good to get any adoption.

Final words

I've just come up with this idea. Please comment below if it's bad, or especially if someone's already doing it and I can just join them.

18 comments

Comments sorted by top scores.

comment by Max Clarke (maxeonyx) · 2022-04-22T22:44:42.495Z · EA(p) · GW(p)

Here's a list of links and people I have found on this topic:

Replies from: paal-fredrik-skjorten-kvarberg, akrolsmir
comment by Paal Fredrik Skjørten Kvarberg (paal-fredrik-skjorten-kvarberg) · 2022-04-23T07:45:37.545Z · EA(p) · GW(p)

Metaculus is also currently working on a similar idea (causal graphs). Here are some more people who are thinking or working on related ideas, (who might also appreciate your post): Adam Binks [EA · GW], Harrison Durland [EA · GW], David Manheim and Arieh Englander (see their MTAIR project [LW · GW]).

comment by Austin (akrolsmir) · 2022-04-24T00:17:20.530Z · EA(p) · GW(p)

Yeah I think Metaculus would be the best people to go talk about this. I know Gaia (Metaculus's CEO) is super excited about causal graphs; happy to intro you two if you'd like!

Also: Max, what's your background? Do you have the capacity to do direct work on this (ie build this out on your own?) If so, what are your bottlenecks (eg a year of funding for this project?)

Replies from: maxeonyx, akrolsmir
comment by Max Clarke (maxeonyx) · 2022-04-26T01:27:29.081Z · EA(p) · GW(p)

My background is as a software developer, with professional web-dev experience. I'm currently doing a research master's on ML (transformers) and last year did a project surveying the field of probabilistic programming languages.

Because of my Master's, I don't have capacity to work on this right at the moment, but come September this year it's absolutely a candidate for things I would work on. I do have the skills and will have capacity to work on this on my own if I think that it's my best option for impact. From September onward, I have two bottlenecks: 1) Funding 2) Finding the best use of my time among many options.

comment by Austin (akrolsmir) · 2022-04-24T00:18:47.416Z · EA(p) · GW(p)

FWIW we'd love something like this in Manifold too, but that's probably a bit farther out; Metaculus is much better developed in terms of complex in-depth estimation/forecasting, while Manifold is trying to focus on being as simple as possible.

comment by Ozzie Gooen (oagr) · 2022-04-22T22:32:36.411Z · EA(p) · GW(p)

Happy to see more discussion on these topics.

Much of this is a part of what both some of the EA forecasting community, and what we at [QURI](https://quantifieduncertainty.org/), are working on.

I think the full thing is much more work than you think it is. I suggest trying to take one subpart of this problem and doing it very well, instead of taking the entire thing on at once. 

Replies from: maxeonyx, maxeonyx
comment by Max Clarke (maxeonyx) · 2022-04-22T23:08:23.882Z · EA(p) · GW(p)

I've been thinking about which sub-parts to tackle, but I think that the project just isn't very valuable until it has all three of:

  • A Prediction / estimation aggregation tool
  • Up-to-date causal models (using a simplified probabilistic programming language)
  • Very good UX, needed for adoption.

It's a lot of work, yes, but that doesn't mean it can't happen. I'm not sure there's a better way to split it up and still have it be valuable. I think the MVP for this project is a pretty high bar.

Ways to split it up:

  • Do the probabilistic programming language first. This isn't really valuable, it's  a research project that no one will use.
  • Do the prediction aggregation part first. This is metaculus.
  • Do the knowledge graph part first. This is maybe a good start - it's a wiki with better UX? I'm sure someone is scoping this out / doing it.

These things empower each other.

It's hard, but nevertheless I'd estimate definitely no more than 3 person-years of effort for the following things:

  • A snappy, good-looking prediction/estimation (web) interface.
  • A causal model editor with a graph view.
  • A backend that can update the distributions with monte-carlo simulations.
  • Rich-text comments and posts attached to models, bets and "markets" (still need a better name than "markets")
  • I-frames for people to embed the UI elsewhere.

What do you estimate?

comment by Max Clarke (maxeonyx) · 2022-04-22T22:48:34.012Z · EA(p) · GW(p)

I'd love to make an aggregate estimate for how much work this project would take

comment by IanDavidMoss · 2022-04-22T12:21:28.145Z · EA(p) · GW(p)

I think this idea is really cool (albeit hard to pull off successfully)! You're definitely not the first person to think of it, but I don't know of any comparable efforts that have turned into actual products yet. I think the biggest challenge will be maintaining the right ratio of models to estimators, as you could very easily have the former outrun the latter without some kind of subsidy for people's time. There's already a challenge around recruiting and retaining talented forecasters, and this sort of estimation may be in some ways more cognitively demanding / less rewarding for the participants. So you might want to have a setup where the impact models are tightly curated and there are incentives to attract estimators, at least at the beginning.

If you end up moving forward with a prototype, I'd be interested in providing input on the product design as an alpha user.

Replies from: IanDavidMoss, maxeonyx
comment by IanDavidMoss · 2022-04-22T12:29:19.319Z · EA(p) · GW(p)

I don't know of any comparable efforts that have turned into actual products yet.

Actually, come to think of it, the S-Process used for the Survival and Flourishing Fund is an implementation of one version of this idea.

Replies from: maxeonyx
comment by Max Clarke (maxeonyx) · 2022-04-22T19:57:29.331Z · EA(p) · GW(p)

Thanks, that's good feedback. I will check out the linked video. If you know anyone to get in touch with, I'd be keen to talk to them.

My guess is that this idea has been independently thought about many times, and if it's not bad for some reason, already funded.

comment by Max Clarke (maxeonyx) · 2022-04-22T20:58:44.036Z · EA(p) · GW(p)

I've found a similar project by Paal Kvarberg, described in this document here: https://docs.google.com/document/d/1An-NGrQUPSJ4v8HdsZO-BcAq5NpyrEqv32rdk-N4guo/edit#

I think he didn't pursue it - the document hasn't been updated for a year.

Replies from: paal-fredrik-skjorten-kvarberg
comment by Paal Fredrik Skjørten Kvarberg (paal-fredrik-skjorten-kvarberg) · 2022-04-23T07:26:29.433Z · EA(p) · GW(p)

Seems like I forgot to change "last updated 04.january 2021" to "last updated 04. january 2022" when I made changes in january haha. 

I am still working on this. I agree with Ozzie's comment below that doing a small part of this well is the best way to make progress. We are currently looking at the UX part of things. As I describe under this heading in the doc, I don't think it is feasible to expect many non-expert forecasters to enter a platform to give their credences on claims. And the expert forecasters are, as Ian mentions below, in short supply. Therefore, we are trying to make it easier to give credences on issues while reading about them the same place you read about them. I tested this idea out in a small experiment this fall (with google docs), and it does seem like motivated people who would not enter prediction platforms to forecast issues might give their takes if elicited this way. Right now we are investigating this idea further through an mvp of a browser extension that lets users give credences on claims found in texts on the web. We will experiment some more with this during the fall. A more tractable version of the long doc is likely to appear at the forum at some point. 

I'm not wedded to the concrete ideas presented in the doc, I just happen to think they are good ways to move closer to the grand vision. I'd be happy to help any project moving in that direction:)

comment by Emrik · 2022-04-22T21:12:55.179Z · EA(p) · GW(p)

I'm in favour of the project, but here's a consideration against: Making people in the community more confident about what the community thinks about a subject, can be potentially harmfwl.

Testimonial evidence is the stuff you get purely because you trust another reasoner (Aumann-agreement fashion), and technical evidence is everything else (observation, math, argument).

Making people more aware of testimonial evidence will also make them more likely to update on it, if they're good Bayesians. But this also reduces the relative influence that technical evidence has on their beliefs. So although you are potentially increasing the accuracy of each member's beliefs, you are also weakening the link community opinion has to technical evidence, and that leaves us more prone to information cascades and slower to update on new discoveries/arguments.

But this is mainly a problem for uniform communities where everyone assigns the same amount of trust to everyone else. If, on the other hand, we have "thought leaders" (highly trusted researchers who severely distrust others' opinions and stubbornly refuse to update on anything other than technical evidence), then their technical-evidence grounded beliefs can filter through to the rest of the community, and we get the best of both worlds.

Replies from: maxeonyx
comment by Max Clarke (maxeonyx) · 2022-04-22T22:13:24.226Z · EA(p) · GW(p)

One of the approaches here, is to A) require people sign up, and B) don't show people aggregated predictions until they have posted their own. 

Replies from: Brendon_Wong
comment by Brendon_Wong · 2022-04-24T20:38:27.990Z · EA(p) · GW(p)

I think that’s a good idea to reduce groupthink! Also, I think it can be helpful to uncover if specific individuals and sub-groups think a proposal is promising based on their estimates, since rarely will an entire group view something similarly. This could bring individuals together to further discuss and potentially support/execute the idea.

comment by Paal Fredrik Skjørten Kvarberg (paal-fredrik-skjorten-kvarberg) · 2022-04-23T08:01:15.839Z · EA(p) · GW(p)

Thanks for writing up this idea in such a succinct and forceful way. I think the idea is good, and would like to help any way I can. However, I would encourage thinking a lot about the first part "If we get the EA community to use a lot of these", which I think might be the hardest part. 

I think that there are many ways to do something like this, and that it's worth thinking very carefully about details before starting to build. The idea is old, and there is a big graveyard of projects aiming for the same goal. That being said, I think a project of this sort has amazing upsides. There are many smart people working on this idea, or very similar ideas right now, and I am confident that something like this is going to happen at some point. 

comment by Brendon_Wong · 2022-04-24T20:28:02.044Z · EA(p) · GW(p)

I think this is an excellent idea and one that I’ve wanted to exist for quite a few years now! My interest in this area stems from wanting to surface compelling strategies and projects that don’t receive sufficient attention because they’re not currently “trending” in the movement. By explicitly laying out theories of change and crowdsourcing estimates for each part of those theories of change, this makes it much easier for people to to understand proposals, identify how various people and groups in the community think about a proposal, and compare the expected impact of various strategies and projects against each other.

Right now, people submit ideas and projects on the EA Forum, but that doesn’t clearly translate to action. But if it’s pretty clear that specific individuals, sub-groups in the community, or the community at large think a theory of change is promising, I think this has the potential to greatly increase awareness and the likelihood of execution of promising proposals.