Yes, for me updating upwards on total success on a lower percentage success rate seems intuitively fairly weird. I'm not saying it's wrong, it's that I have to stop and think about it/use my system 2.
In particular, you have to have a prior distribution such that more valuable opportunities have a lower success rate. But then you have to have a bag of opportunities such that the worse they do, the more you get excited.
Now, I think this happens if you have a bag with "golden tickets", "sure things", and "duds". Then not doing well would make you more excited if "sure things" were much less valuable than the weighted average of "duds" and "golden tickets".
But to get that, I think you'd have to have "golden tickets" be a binary thing. But in practice, take something like GovAI. It seems like its theory of impact [EA · GW] is robust enough that I would expect to see a long tail of impact or impact proxies, rather than a binary success/not success, instead of a lottery ticket shaped impact. Say that I'd expect their impact distribution to be a power law: In that case, I would not get more excited if I saw them fail again and again. Conversely, if I do see them getting some successes, I would update upwards on the mean and the standard deviation of the power law distribution from which their impact is drawn.aarongertler on What should the norms around privacy and evaluation in the EA community be?
I was too vague in my response here: By "the responsible conclusion", I mean something like "what seems like a good norm for discussing an individual project" rather than "what you should conclude in your own mind".
I agree on silent success vs. silent failure and would update in the same way you would upon seeing silence from a project where I expected a legible output.
If the book isn't published in my example, it seems more likely that some mundane thing went poorly (e.g. book wasn't good enough to publish) than that the author got cancer or found a higher-impact opportunity. But if I were reporting an evaluation, I would still write something more like "I couldn't find information on this, and I'm not sure what happened" than "I couldn't find information on this, and the grant probably failed".
(Of course, I'm more likely to assume and write about genuine failure based on certain factors: a bigger grant, a bigger team, a higher expectancy of a legible result, etc. If EA Funds makes a $1m grant to CFAR to share their work with the world, and CFAR's website has vanished three years later, I wouldn't be shy about evaluating that grant.)
I'm more comfortable drawing judgments about an overall grant round. If there are ten grants, and seven of them are "no info, not sure what happened", that seems like strong evidence that most of the grants didn't work out, even if I'm not past the threshold of calling any individual grant a failure. I could see writing something like: "I couldn't find information on seven of the ten grants where I expected to see results; while I'm not sure what happened in any given case, this represents much less public output than I expected, and I've updated negatively about the expected impact of the fund's average grant as a result."
(Not that I'm saying an average grant necessarily should have a legible positive impact; hits-based giving is a thing. But all else being equal, more silence is a bad sign.)nunosempere on What should the norms around privacy and evaluation in the EA community be?
I find the simplicity of this appealing.nunosempere on What should the norms around privacy and evaluation in the EA community be?
If the extent of your evaluation is a quick search for public info, and you don't find much, I think the responsible conclusion is "it's unclear what happened" rather than "something went wrong". I think this holds even for projects that obviously should have public outputs if they've gone well.
So to push back against this, suppose that if you have four initial probabilities (legibly good, silently good, legibly bad, silently bad). Then you also have a ratio (legibly good + silently good) : (legibly bad + silently bad).
Now if you learn that the project was not legibly good or legibly bad, then you update to (silently good, silently bad). The thing is, I expect this ratio silently good : silently bad to be different than the original (legibly good + silently good) : (legibly bad + silently bad), because I expect that most projects, when they fail, do so silently, but that a large portion of successes have a post written about them.
For an intuition pump, suppose that none of the projects from the LTF had any information to be found online about them. Then this would probably be an update downwards. But what's true about the aggregate seems also true probabilistically about the individual projects.
So overall, because I disagree that the "Bayesian" conclusion is uncertainty, I do see a tension between the thing to do to maintain social harmony and the thing to do if one wants to transmit a maximal amount of information. I think this is particularly the case "for projects that obviously should have public outputs if they've gone well".
But then you also have other things, like:
Thanks! I've started an email thread with you, me, and David.aarongertler on What should the norms around privacy and evaluation in the EA community be?
This makes sense, but I don't think this is bad.
I also don't think it's bad. Did I imply that I thought it was bad for people to update in this way? (I might be misunderstanding what you meant.)michaela on What should the norms around privacy and evaluation in the EA community be?
I think part of the problems you point to (though not all) could be easily fixed by just simple tweaks to the initial email: In the initial email, say when you play to post by if you don't get a response (include that in bold) and say something to indicate how much back-and-forth you're ok with / how much time you're able/willing to invest in that (basically to set expectations).
I think you could also email anyone in the org out of the set of people whose email you can quickly find and whose role/position sounds somewhat appropriate, and ask them to forward it to someone else if that's better.nunosempere on What should the norms around privacy and evaluation in the EA community be?
...but I still think that it's appropriate for people to reduce their trust in my conclusions if I'm getting "irrelevant details" wrong. If I notice an author make errors that I happen to notice, I'm going to raise my estimate for how many errors they've made that I didn't notice
This makes sense, but I don't think this is bad. In particular, I'm unsure about my own error rate, and maybe I do want to let people estimate my unknown-error rate as a function of my "irrelevant details" error rate.juliette-ferrer on AMA: Tobias Baumann, Center for Reducing Suffering
Hello Tobias, i am very interested by the article you wrote with David Althaus because we are working with UOW in Australia to propose a grant on this topic. I'd love to discuss about this more with both of you, is there a way to contact you more directly ? Thanks a lot ! Juliettejordan-pieters on What should CEEALAR be called?