Guesstimate: An app for making decisions with confidence (intervals)
post by Ozzie Gooen (oagr) · 20151230T17:30:55.414Z · score: 42 (44 votes) · EA · GW · Legacy · 18 commentsI’m happy to announce the public beta of Guesstimate, a web app for creating models and Fermi estimates with probability distributions. It’s free, open source, and relatively easy to use.
Effective Altruism is largely about optimizing expected value. Expected value estimates often have high uncertainty for multiple inputs, but the uncertainty of the outputs is rarely calculated.
Doing math with probability distributions is possible using Monte Carlo simulations, but few people do this. One reason is that the tools have been relatively inaccessible. The options consist mostly of expensive Excel plugins and statistical programming packages, all of which have significant learning curves and mediocre sharing abilities.
I’ve been experimenting with ideas and applications in this domain for the last few years. About 8 months ago I realized that javascript technologies (React in particular) were finally powerful to properly make a web based Monte Carlo tool. About 4 months ago I quit my job and have worked since to build one.
After a lot of late nights, I now believe Guesstimate works quite well. I’ve used it to to figure out when to leave for meetings, to better understand risks of common activities, to make all kinds of decisions, and to replicate some of the standard EA models (Like 80,000 Hours’ estimate of the influence of being a UK politician).
Guesstimate is similar to Excel in that it uses a spreadsheet format. However, while Excel is general purpose and good at analyzing existing data, Guesstimate is specifically for making models of uncertain estimates. This means a few things:

Every cell can be a probability distribution instead of a number. Ranges are denoted as [5th percentile, 95th percentile]. [1]

Every cell represents a metric, with both a name and value. Descriptions can be added to each metric separately. [2]

Models are meant to be shared online publicly. [3]
I don’t know what Guesstimate is ultimately going to become. I have a ton of ideas about where I would like it to go, but I’m going to first follow what people ask for. I encourage you to try estimating interesting and useful things. I'm excited to see what people will come up with.
Footnotes
[1] Right now it simulates around 5000 samples per calculation. It does this even for simple calculations, like addition for normal distributions. Over time we could do some of these analytically, but for now this seemed like the simpler option.
[2] In the future, obviously comments, versioning, and unique URLs could be part of this.
[3] This means both that it’s easy to share them, but also that in the future we could add interactivity between public models. For instance, if one person estimates the ‘expected earnings of investing in a high risk ETF’, someone else could use that in their model of the opportunity cost of exercising stock options. It also means that the focus is more on the reader than the writer.
18 comments
Comments sorted by top scores.
This is great. I take this opportunity to offer a feature request, and apologise I lack the technical facility to contribute to making it happen.
Insofar as I can tell the montecarlo is developing uniform or normal distributions for the data. A wider family of distributions would be extremely helpful, or (possibly even better) the ability to bootstrap a distribution via monte carlo for a limited set of original data. I suspect our brains are not too bad at guestimation in the cases of fairly wellbehaved distributions, and the app will have even greater value if it allows us to navigate distributions which thank to significant skew or kurtosis have very counterintuitive behaviours when combined together.
Agreed that more distributions need to happen. It's the #1 feature I'll work on, after better insuring stability and so on.
I agree. Glancing at the politics influence estimate, the final confidence interval seems much narrower than my intuitive judgement. One reason for that could be that the distributions should have wider tails. Or it might be mainly because there's more uncertainty in the inputs than is being modeled here (if I read it correctly, the fraction of spending determined by policy and fraction of influence figures are taken as numbers rather than distributions).
I would agree they seem quite narrow.
It's possible that it's not just a matter of distributions types, but also that the model itself didn't have enough uncertainty when written. Which would mean that the tool already shed light on something not obvious when in article form, if true.
Awesome work!
I've made a model to estimate the expected reduction in extinction risk of a marginal AI safety researcher based on the GPP model. I invite you to play with the numbers and note your estimate!
I'll find this useful for planning and evaluation. (And that's not a conclusion I came to instantly; it took a couple of months of looking at the alpha versions that Ozzie's been sharing widely on the EA projects Slack. So I'd recommend taking some time to think about uses you could put Guesstimate to, and keeping it in the back of your head!)
Good point. I've found that many people aren't initially sure of what they would want to estimate. I'm going to write a post (or a few) on this, but now are some ideas:
How effective do you expect each EA intervention to be? What about strange new ideas? Givewell does pretty comprehensive estimates, but smaller ones would make sense for more speculative ideas.
Considering a large project? Estimate the lengths of the subcomponents, then see the worst (and best) 10% outcome for the time.
Making a decision between 24 things, where much of it is numeric (like money)? If you categorize those in Guesstimate, you can see the likelihood that any one is the best. It can become obvious when the value of additional information is 0; when it makes sense to stop modeling and thinking about it and simply make the decision.
Have a few different ways to model something? Try multiple models out and see which produces the most certainty. This is a really good indicator of which is the best (if you do this, I would recommend including a node for model uncertainty, which is the uncertainty of the rest of the model).
For animal causes, what's your estimate on the amount of pain different animals suffer? Try multiplying this to find the expected impact of consuming different animal products. Then take that estimate and use it to find the benefit of interventions to get others not to consume those products.
Have you thought about contacting anyone at the Good Judgement Project? This seems like it would be a pretty useful tool for people in forecasting tournaments.
Ohhh this looks cool. I'm going to check it out, I can already think of some interesting ways to apply it to investment. Thanks.
Does Guesstimate provide ways for the user to calibrate himself over time when he's overconfident when he starts using it?
That's the plan, but first there's going to be a lot of other statistical work.
+1 this is awesome
Do you have a 'meta' website describing the project more? E.g. the techs used and so on.
Public repo is here: https://github.com/getguesstimate/guesstimateapp
Tech description: https://github.com/getguesstimate/guesstimateapp/issues/47#issuecomment169109311
Hacker news discussion here: https://news.ycombinator.com/item?id=10816563
If you have any specific questions after that, please post.
No, it sounds as if https://github.com/getguesstimate/guesstimateapp (specifically the README.md shown at the bottom of it) will be the place you keep updated featuring information about the project and links to this. Thank you.
After posting this here, I decided to post a similar piece on Medium, written for a different audience. Interestingly, that one did a bit better.
Link us?
Post is on the bottom, though I guess not very noticeable. https://medium.com/guesstimateblog/introducingguesstimateaspreadsheetforthingsthatarentcertain2fa54aa9340#.d1f48oul3
Since I wrote that, it got another few hundred recommendations. The launch went far better than I expected (may still be going?), will write a summary later.