Request for comments: EA Projects evaluation platform

post by Jan_Kulveit · 2019-03-20T22:36:32.565Z · score: 32 (30 votes) · EA · GW · 54 comments

Contents

    Evaluation process
  Why this particular evaluation process
    Evaluators
      Good reasons to volunteer
      Bad reasons to volunteer
      Strong reason not to volunteer
    Projects
None
54 comments

Edit: It is likely there will be a second version of this proposal, modified based on the feedback and comments.

The effective altruism community has a great resource - its members, motivated to improve the world. Within the community, there are many ideas floating around, and entrepreneurially-minded people keen to execute on them. As the community grows, we get more effective altruists with different skills, yet in other ways it becomes harder to start projects. It’s hard to know who to trust, and hard to evaluate which project ideas are excellent , which are probably good, and which are too risky for their estimated return.

We should be concerned about this: the effective altruism brand has a significant value, and bad projects can have repercussions for both the perception of the movement and the whole community. On the other hand, if good projects are not started, we miss out on value, and miss opportunities to develop new leaders and managers. Moreover, inefficiencies in this space can cause resentment and confusion among people who really want to do good and have lots of talent to contribute.

There's also a danger that as a community we get stuck on the old core problems, because funders and researchers trust certain groups to do certain things, but lack the capacity to vet new and riskier ideas, and to figure out which new projects should form. Overall, effective altruism struggles [EA · GW] to use its greatest resource - effective people. Also, while we talk about “cause X”, currently new causes may struggle to even get serious attention.

One idea to address this problem, proposed independently at various times by me and several others, is to create a platform which provides scalable feedback on project ideas. If it works, it could become an efficient way to separate signal from noise and spread trust as our community grows. In the best case, such a platform could help alleviate some of the bottlenecks the EA community faces, harness more talent and energy than we are currently able to do, and make it easier for us to make investments in smaller, more uncertain projects with high potential upside.

As discussed in a previous post, What to do with people [EA · GW], I see creating new network-structures and extending existing ones as one possible way to scale. Currently, effective altruists use different approaches to get feedback on project proposals depending on where they are situated in the network: there is no ready-made solution that works for them all.

For effective altruists in the core of the network, the best method is often just to share a google doc with a few relevant people. Outside the core, the situation is quite different, and it may be difficult to get informative and honest feedback. For example, since applications outnumber available budget slots, by design most grant applications for new projects are rejected; practical and legal constraints mean that these rejections usually come without much feedback, which can make it difficult to improve the proposals. (See also EA is vetting-constrained [EA · GW])

For all of these reasons, I want to start an EA projects evaluation platform. For people with a project idea, the platform will provide independent feedback on the idea, and an estimate of the resources needed to start the project. In a separate process, the platform would also provide feedback on projects further in their life, evaluating team and idea fit. For funders, it can provide an independent source of analysis.

What follows is a proposal for such a platform. I’m interested in feedback and suggestions for improvement: the plan is to launch a cheap experimental run of the evaluation process in approximately two weeks. I’m also looking for volunteer evaluators.

Evaluation process

Project ideas will get evaluated in a multi-step process:

1a. Screening for infohazards, proposals outside of the scope of effective altruism, or otherwise obviously unsuitable proposals (ca 15m / project)

1b. Peer review in a debate framework. Two referees will write evaluations, one focusing on the possible negatives, costs and problems of the proposal; and the other on the benefits. Both referees will also suggest what kind of resources a team attempting the project should have. (2-5h / analyst / project)

1c. Both the proposal and the reviews will get anonymously published on the EA forum, gathering public feedback for about one week. This step will also allow back-and-forth communication with the project initiator.

1d. A panel will rate the proposal, utilizing the information gathered in phases b. and c., highlighting which part of the analysis they consider particularly important. (90m / project)

1e. In case of disagreement among the panel, the question will get escalated and discussed with some of the more senior people in the field.

1f. The results will get published, probably both on the EA projects platform website, and on the forum.

In a possible second stage, if a team forms around a project idea, it will go through similar evaluation, focusing on the fit between the team and the idea, possibly with the additional step of a panel of forecasters predicting the success probability and expected impact of the project over several time horizons.

Currently, the plan is to run a limited test of the viability of the approach, on a batch of 10 project ideas, going through steps 1a-f.

Why this particular evaluation process

The most bottlenecked resource for evaluations, apart from structure, is likely the time of experts. This process is designed to utilize the time of experts in a more leveraged way, utilize the inputs from the broader community, and also to promote high-quality discussion on the EA forum. (Currently, problematic project proposals posted on the forum often attract downvotes, but rarely detailed feedback.)

Having two “opposing” reviews attempts to avoid the social costs of not being nice: by having clear roles, everyone will understand that writing an analysis which tries to find flaws and problems was part of the job. Also, it can provoke higher quality public discussion.

Splitting steps b.,c. and d. is motivated by the fact that mapping arguments is a different task than judging them.

Project ideas are on a spectrum where some are relatively robust to the choice of team, while the impact of other projects may mostly depend on the quality of the team, including the sign of the impact. By splitting the evaluation of ideas from the evaluation of (idea+team), it should be possible to communicate opinions like “this is a good idea, but you are likely not the best team to try it” with more nuance.

Overall the design space of possible evaluation processes is large, and I believe it may just be easier to run an experiment and iterate. Based on the results, it should be relatively easy to make some steps from 1.a-e simpler, omit them altogether, or make them more rigorous. Also the stage 2 process can be designed based on the stage 1 results.

Evaluators

I’m looking looking for 5-8 volunteer analysts, who will write the reviews for the second step (1b) of the process. The role is suitable for people with similar skills to generalist research analyst at OpenPhil, such as:

Expected time-commitment is about 15-20h for the first run of evaluations, and if the project continues, about 15-20h per month. The work will mostly happen online in a small team, communicating on Slack. There isn’t any remuneration, but I hope there will be events like a dinner during EA Global, or similar opportunities to meet.

Good reasons to volunteer

Bad reasons to volunteer

you feel some specific project by you or your friends was undeservedly rejected by existing grant-making organizations, and you want help the project

Strong reason not to volunteer

If you want to join, please send your linkedin/CV and a short paragraph-long description of your involvement with effective altruism to eaprojectsorg@gmail.com

Projects

In the first trial, I’d like to test the viability of the process on about 10 project ideas. You may want to propose a project idea either where you would be interested in running the project or in cases where you would want someone else to lead the project, with you helping e.g. via advice or funding. At present, it probably isn’t very useful to propose projects you don’t plan to support in some significant way.

It is important to understand that the evaluations absolutely do not come with any promise of funding. I would expect the evaluations may help project ideas which come out with positive feedback from the process, because funders or EAs earning to give or potential volunteers or co-founders may pick up the signal. Negative feedback may help with improving the projects, or having realistic expectations about necessary resources. There is also value in bad projects not happening, and negative feedback can help people to move on from dead-end projects to more valuable things.

Also it should be clear that the project evaluations will not constitute any official “seal of approval” - this is a test run of volunteer project and has not been formally endorsed by any particular organization.

I’d like to thank Max Daniel, Rose Hadshar, Ozzie Gooen, Max Dalton, Owen Cotton-Barratt, Oliver Habryka, Harri Besceli, Ryan Carey, Jah Ying Chung and others for helpful comments and discussions on the topic.

54 comments

Comments sorted by top scores.

comment by RyanCarey · 2019-03-21T12:11:55.128Z · score: 25 (12 votes) · EA(p) · GW(p)

I'm a big fan of the idea of having a new EA projects evaluation pipeline. Since I view this as an important idea, I think it's important to get the plan to the strongest point that it can be. From my perspective, there are only a smallish number of essential elements for this sort of plan. It needs a submissions form, a detailed RFP, some funders, and some evaluators. Instead, we don't yet have these (e.g. detail re desired projects, consultation with funders). But then I'm confused about some of the other things that are emphasised: large initial scale, a process for recruiting volunteer-evaluators, and fairly rigid evaluation procedures. I think the fundamentals of the idea are strong enough that this still has a chance of working, but I'd much prefer to see the idea advanced in its strongest possible form. My previous comments on this draft are pretty similar to Oliver's, and here are some of the main ones:

This makes sense to me as an overall idea. I think this is the sort of project where if you do it badly, it might dissuade others from trying the same. So I think it is worth getting some feedback on this from other evaluators (BERI/Brendon Wong). It would also probably be useful to get feedback from 1-2 funders (maybe Matt Wage? Maybe someone from OpenPhil?), so that you can get some information about whether they think your evaluation process would be of interest to them, or what might make it so. It could also be useful to have unofficial advisors.

I predict the process could be refined significantly with ~3 projects.

You only need a couple of volunteers and you know perhaps half of the best candidates, so for the purpose of a pilot, did you consider just asking a couple of people you know to do it?

I think you should provide a ~800 word request for proposals. Then you can give a much more detailed description of who you want to apply. e.g. just longtermist projects? How does this differ from the scope of EA grants, BERI, OpenPhil, etc etc? Is it sufficient to apply with just an idea? Do you need a team? A proof of concept? etc etc etc.

This would be strengthened somewhat by already having obtained the evaluators, but this may not be important.
comment by John_Maxwell (John_Maxwell_IV) · 2019-03-22T05:13:26.959Z · score: 13 (6 votes) · EA(p) · GW(p)
Since I view this as an important idea, I think it's important to get the plan to the strongest point that it can be.

It's also important not to let the perfect be the enemy of the good. Seems to me like people are always proposing volunteer-lead projects like this and most of them never get off the ground. Remember this is just a pilot.

I think this is the sort of project where if you do it badly, it might dissuade others from trying the same.

The empirical reality of the EA project landscape seems to be that EAs keep stumbling on the same project ideas over and over with little awareness of what has been proposed or attempted in the past. If this post goes like the typical project proposal post, nothing will come of it, it will soon be forgotten, and 6 months later someone will independently come up with a similar idea and write a similar post (which will meet a similar fate).

comment by John_Maxwell (John_Maxwell_IV) · 2019-03-27T06:11:22.559Z · score: 9 (3 votes) · EA(p) · GW(p)

As a concrete example of this "same project ideas over and over with little awareness of what has been proposed or attempted in the past" thing, https://lets-fund.org is a fairly recent push in the "fund fledgling EA projects" area which seems to have a decent amount of momentum behind it relative to the typical volunteer-lead EA project. What are the important differences between Let's Fund and what Jan is working on? I'm not sure. But Let's Fund hasn't hit the $75k target for their first project, even though it's been ~5 months since their launch.

The EA Hotel is another recent push in the "fund fledgling EA projects" area which is struggling to fundraise. Again, loads of momentum relative to the typical grassroots EA project--they've bought a property and it's full of EAs. What are the relative advantages & disadvantages of the EA Hotel, Let's Fund, and Jan's thing? How about compared with EA Funds? Again, I'm not sure. But I do wonder if we'd be better off with "more wood behind fewer arrows", so to speak.

comment by Jan_Kulveit · 2019-03-21T13:24:31.627Z · score: -2 (9 votes) · EA(p) · GW(p)

On a meta-level

I'm happy to update the proposal to reflect some of the sentiments. Openly, I find some of them quite strange - e.g. it seems, coalescing the steps into one paragraph and assuming all the results (reviews, discussion, "authoritative" summary of the discussion) will just happen may make it look more flexible. Ok, why not.

Also it seems you and Oli seem to be worried that I want to recruit people who are currently not doing some high-impact direct work ... instead of just asking a couple of people around me, which would often mean people already doing impactful volunteer work.

Meta-point is, I'm not sure if you or Oli realize how big part of solving

new EA projects evaluation pipeline

is in consensus-building. Actually I think the landscape of possible ways how to do evaluations looks like in such a way that it is very hard to get consensus on what the "strongest form" is. I'm quite happy to create a bunch of proposals, e.g.

  • with removing final expert evaluation
  • removing initial reviews
  • removing public forum discussions
  • writing an unrealistic assumption that the initial reviews will take 15m instead of hours,
  • suggesting that the volunteers will be my busy friends (whose voluntary work does not count?)
  • emphasising public feedback more, or less
  • giving stronger or weaker voice to existing funders.

I have stronger preference for the platform to happen than for one option in any single of these choices. But what is the next step? After thinking about the landscape for a some time I'm quite skeptical any particular combination of options would not have some large drawback.

On the object level:

Re: funder involvement

Cross-posting from another thread

Another possible point of discussion is whether the evaluation system would work better if it was tied to some source of funding. My general intuition is this would create more complex incentives, but generally I don't know and I'm looking for comments.

I think it much harder to give open feedback if it is closely tied with funding. Feedback from funders can easily have too much influence on people, and should be very careful and nuanced, as it comes from some position of power. I would expect adding financial incentives can easily be detrimental for the process. (For self-referential example, just look on this discussion: do you think the fact that Oli dislikes my proposal and suggest LTF can back something different with $20k will not create at least some unconscious incentives?)

We had some discussion with Brendon, and I think his opinion can be rounded to "there are almost no bad projects, so to worry about them is premature". I disagree with that. Also, given the Brendon's angel group is working, evaluating and funding projects since October [EA · GW], I would be curious what projects were funded, what was the total amount of funding allocated, how many applications they got.

Based on what I know I'm unconvinced that Brendon or BERI should have some outsized influence how evaluations should be done; part of the point of the platform would be to serve broader community.

comment by Brendon_Wong · 2019-03-21T20:08:04.055Z · score: 14 (9 votes) · EA(p) · GW(p)
We had some discussion with Brendon, and I think his opinion can be rounded to "there are almost no bad projects, so to worry about them is premature". I disagree with that.

I do not think your interpretation of my opinion on bad projects in EA is aligned with what I actually believe. In fact, I actually stated [EA(p) · GW(p)] my [EA(p) · GW(p)] opinion [EA(p) · GW(p)] in [EA(p) · GW(p)] writing [EA(p) · GW(p)] in a response to you two days ago which seems to deviate highly from your interpretation of my opinion.

I never said that there are "almost no bad projects." I specifically said I don't think that "many immediately obvious negative EV projects exist." My main point was that my observations of EA projects in the entire EA space over the last five years do not line up with a lot of clearly harmful projects floating around. This does not preclude the possibility of large numbers of non-obviously bad projects existing, or small numbers of obviously bad projects existing.

I also never stated anything remotely similar to "to worry about [bad projects] is premature." In fact, my comment said that the EA Angel Group helps prevent the "risk of one funder making a mistake and not seeking additional evaluations from others before funding something" because there is "an initial staff review of projects followed by funders sharing their evaluations of projects with each other to eliminate the possibility of one funder funding something while not being aware of the opinion of other funders."

I believe that being attentive to the risks of projects is important, and I also stated in my comment that risk awareness could be of even higher importance when it comes to projects that seek to impact x-risks/the long-term future, which I believe is your perspective as well.

Also, given the Brendon's angel group is working, evaluating and funding projects since October [EA · GW], I would be curious what projects were funded, what was the total amount of funding allocated, how many applications they got.

Milan asked this question and I answered it [EA(p) · GW(p)].

Based on what I know I'm unconvinced that Brendon or BERI should have some outsized influence how evaluations should be done; part of the point of the platform would be to serve broader community.

I'm not entirely sure what your reasons are for having this opinion, or what you even mean. I am also not exactly sure what you define as an "evaluation." I am interpreting evaluations to mean all of the assessments of projects happening in the EA community from funders or somewhat structured groups designed to do evaluations.

I can't speak for BERI, but I currently have no influence on how evaluations should be done, and I also currently have no interest in influencing how evaluations should be done. My view on evaluations seems to align with Oliver Habryka's view that "in practice I think people will have models that will output a net-positive impact or a net-negative impact, depending on certain facts that they have uncertainty about, and understanding those cruxes and uncertainties is the key thing in understanding whether a project will be worth working on." I too believe this is how things work in practice, and evaluation processes seem to involve one or more people, ideally with diverse views and backgrounds, evaluate a project, sometimes with a more formalized evaluation framework taking certain factors into account. Then, a decision is made, and the process repeats at various funding entities. Perhaps this could be optimized by having argument maps or a process that involves more clearly laying out assumptions and assigning mathematical weights to them, but I currently have no plans to try to go to EA funders and suggest they all follow the same evaluation protocol. Highly successful for-profit VCs employ a variety of evaluation models and have not converged on a single evaluation method. This suggests that perhaps evaluators in EA should use different evaluation protocols since different protocols might be more or less effective with certain cause areas, circumstances, types of projects, etc.

comment by John_Maxwell (John_Maxwell_IV) · 2019-03-22T05:56:43.279Z · score: 5 (4 votes) · EA(p) · GW(p)
I actually stated my opinion in writing in a response to you two days ago which seems to deviate highly from your interpretation of my opinion.

I think I've seen forum discussions where language has been an unacknowledged barrier to understanding in the past, so it might be worth flagging that Jan is from the Czech Republic and likely does not speak English as his mother tongue.

comment by Brendon_Wong · 2019-03-22T06:06:58.377Z · score: 8 (5 votes) · EA(p) · GW(p)

Thanks for pointing that out! Jan and I have also talked outside the EA Forum about our opinions on risk in the EA project space. I’ve been more optimistic about the prevalence of negative EV projects, so I thought there was a chance that greater optimism was being misinterpreted as a lack of concern about negative EV projects, which isn’t my position.

comment by Jan_Kulveit · 2019-03-22T10:02:09.061Z · score: 0 (3 votes) · EA(p) · GW(p)

My impression was based mostly on our conversations several months ago - quoting the notes from that time

lot of the discussion and debate derives from differing assumptions held by the participants regarding the potential for bad/risky projects: Benjamin/Brendon generally point out the lack of data/signal in this area and believe launching an open project platform could provide data to reduce uncertainty, whereas Jan is more conservative and prioritizes creating a rigorous curation and evaluation system for new projects.

I think it is fair to say you expected very low risk from creating an open platform where people would just post projects and seek volunteers and funding, while I expected with minimum curation this creates significant risk (even if the risk is coming from small fraction of projects). Sorry if I rounded off suggestions like "let's make an open platform without careful evaluation and see" and "based on the project ideas lists which existed several years ago the amount of harmful projects seems low" to "worrying about them is premature".

Reading your recent comment, it seems more careful, and pointing out large negative outcomes are more of a problem with x-risk/long-term oriented projects.

In our old discussions I also expressed some doubt about your or altruism.vc ability to evaluate x-risk and similar projects, where your recent post states that projects that impact x-risks by doing something like AI safety research has not yet applied to the EA Angel Group.

I guess part of the disagreement comes from the fact that I have focus on x-risk and the long-term future, and I'm more interested both in improving the project landscape in these areas, and more worried about negative outcomes.

If open platforms or similar evaluation process also accept mitigating x-risk and similar proposals, in my opinion, unfortunately the bar how good/expert driven evaluations you need is higher, and unfortunately signals like "this is a competent team" which VCs would mainly look at are not enough.

Because I would expect the long-term impact will come mainly from long-term, meta-, exploratory or very ambitious projects, I think you can be basically right about low obvious risk of all the projects historically posted on hackpad or proposed to altruism.vc, and still miss the largest term in the EV.

Milan asked this question and I answered it [EA(p) · GW(p)].

Thanks - both of that happened after I posted my comment, and also I still do not see the numbers which would help me estimate the ratio of projects which applied and which got funded. I take as mildly negative signal that someone had to ask, and this info was not included in the post, which solicits project proposals and volunteer work.

In my model it seems possible you have something like chicken-and-egg problem, not getting many great proposals, and the group of unnamed angels not funding many proposals coming via that pipeline.

If this is the case and the actual number of successfully funded projects is low, I think it is necessary to state this clearly before inviting people to work on proposals. My vague impression was we may disagree on this, which seems to indicate some quite deep disagreement about how funders should treat projects.

I'm not entirely sure what your reasons are for having this opinion, or what you even mean

The whole context was, Ryan suggested I should have sought some feedback from you. I actually did that, and your co-founder noted that he will try to write the feedback on this today or tomorrow, on 11th of Mar - which did not happen. I don't think this is large problem, as we had already discussed the topic extensively.

When writing it I was somewhat upset about the mode of conversation where critics do ask whether I tried to coordinate with someone, but just assume I did not. I apologize for the bad way it was written.

Overall my summary is we probably still disagree in many assumptions, we did invest some effort trying to overcome them, it seems difficult for us to reach some consensus, but this should not stop us trying to move forward.

comment by Brendon_Wong · 2019-03-22T19:31:42.841Z · score: 3 (3 votes) · EA(p) · GW(p)
I think it is fair to say you expected very low risk from creating an open platform where people would just post projects and seek volunteers and funding, while I expected with minimum curation this creates significant risk (even if the risk is coming from small fraction of projects). Sorry if I rounded off suggestions like "let's make an open platform without careful evaluation and see" and "based on the project ideas lists which existed several years ago the amount of harmful projects seems low" to "worrying about them is premature".

The community has already had many instances of openly writing about ideas, seeking funding on the EA Forum, Patreon, and elsewhere, and posting projects in places like the .impact hackpad and the currently active EA Work Club. Since posting about projects and making them known to community members seems to be a norm, I am curious about your assessment of the risk and what, if anything, can be done about it.

Do you propose that all EA project leaders seek approval from a central evaluation committee or something before talking with others about and publicizing the existence of their project? This would highly concern me because I think it's very challenging to predict the outcomes of a project, which is evidenced by the fact that people have wildly different opinions on how good of an idea or how good of a startup something is. Such a system could be very negative EV by greatly reducing the number of projects being pursued by providing initial negative feedback that doesn't reflect how the project would have turned out or decreasing the success of projects because other people are afraid to support a project that did not get backing from an evaluation system. I expect significant inaccuracy from my own project evaluation system as well as the project evaluation systems of other people and evaluation groups.

Thanks - both of that happened after I posted my comment, and also I still do not see the numbers which would help me estimate the ratio of projects which applied and which got funded. I take as mildly negative signal that someone had to ask, and this info was not included in the post, which solicits project proposals and volunteer work.
In my model it seems possible you have something like chicken-and-egg problem, not getting many great proposals, and the group of unnamed angels not funding many proposals coming via that pipeline.
If this is the case and the actual number of successfully funded projects is low, I think it is necessary to state this clearly before inviting people to work on proposals. My vague impression was we may disagree on this, which seems to indicate some quite deep disagreement about how funders should treat projects.

I wrote about the chicken and the egg problem here [EA(p) · GW(p)]. As noted in my comments on the announcement post, the angels have significant amounts of funding available. Other funders do not disclose some of these statistics, and while we may do so in the future, I do not think it is necessary before soliciting proposals. The time cost of applying is pretty low, particularly if people are recycling content they have already written. I think we are the first grantmaking group to give all applicants feedback on their application which I think is valuable even if people do not get funded.

The whole context was, Ryan suggested I should have sought some feedback from you. I actually did that, and your co-founder noted that he will try to write the feedback on this today or tomorrow, on 11th of Mar - which did not happen. I don't think this is large problem, as we had already discussed the topic extensively.

Ben commented on your Google Document that was seeking feedback. I wouldn't say we've discussed the topic "extensively" in the brief call that we had. The devil is in the details, as they say.

comment by RyanCarey · 2019-03-21T19:00:03.741Z · score: 12 (5 votes) · EA(p) · GW(p)

This is an uncharitable reading of my comment in many ways.

First, you suggest that I am worried that you want to recruit people not currently doing direct work. All things being equal, of course I would prefer to recruit people with fewer alternatives. But all things are not equal. If you use people you know for the initial assessments, you will much more quickly be able to iron out bugs in the process. In the testing stages, it's best to have high-quality workers that can perceive and rectify problems, so this is a good use of time for smart, trusted friends, especially since it can help you postpone the recruitment step.

Second, you suggest that I am in the dark about the importance of consensus-building. But this assumes that I believe the only use for consultation is to reach agreement. Rather, by talking to the groups working in related spaces like BERI, Brendon, EA grants, EA funds, and donors, you will of course learn some things, and your beliefs will probably get closer. On aggregate, your process will improve. But also you will build a relationship that will help you to share proposals (and in my opinion funders).

Third, you raise the issue of connecting funding with evaluation. Of course, the distortionary effect is significant. I happen to think the effect from creating an incentive for applicants to apply is larger and more important, and funders should be highly engaged. But there are also many ways that you could have funders be moderately engaged. You could check what would be a useful report for them, that would help them to decide to fund something. You could check what projects they are more likely to fund.

The more strategic issue is as follows. Consensus is hard to reach. But a funding platform is a good that scales with the size of the network of applicants (and imo funders). Somewhat of a natural monopoly (although we want there to be at least a few funders.) You eventually want widespread community-support of some form. I think that as you suggest, that means we need some compromise, but I think it also weighs in favour of more consultation, and in favour of a more experimental approach, which projects are started in a simple form.

comment by Jan_Kulveit · 2019-03-21T19:42:52.863Z · score: 11 (5 votes) · EA(p) · GW(p)

It is possible my reading of your post somewhat blended with some other parts of the discussion, which are in my opinion quite uncharitable reading of the proposal. Sorry for that.

Actually from the list, I talked about it and shared the draft with people working on EA grants, EA funds, and Brendon, and historically I had some interactions with BERI. What I learned is people have different priors over existence of bad projects, ratio of good projects, number of projects which should or should not get funded. Also opinions of some of the funders are at odds with opinions of some people I trust more than the funders.

I don't know, but it seems to me you are either a bit underestimating the amount of consultation which went into this, or overestimating how much agreement is there between the stakeholders. Also I'm trying to factor in the interests of the project founders, and overall I'm more concerned whether the impact in the world would be good, and what's good for the whole system.

Despite repeated claims the proposal is very heavy, complex, rigid, etc. I think the proposed project would be in fact quite cheap, lean, and flexible (and would work). I'm also quite flexible in modifying it in any direction which seems consensual.

comment by Denise_Melchin · 2019-03-21T15:10:11.348Z · score: 10 (5 votes) · EA(p) · GW(p)
I think it much harder to give open feedback if it is closely tied with funding. Feedback from funders can easily have too much influence on people, and should be very careful and nuanced, as it comes from some position of power. I would expect adding financial incentives can easily be detrimental for the process. (For self-referential example, just look on this discussion: do you think the fact that Oli dislikes my proposal and suggest LTF can back something different with $20k will not create at least some unconscious incentives?)

I'm a bit confused here. I think I disagree with you, but maybe I am not understanding you correctly.

I consider having people giving feedback to have 'skin in the game' to be important for the accuracy of the feedback. Most people don't enjoy discouraging others they have social ties with. Often reviewers without sufficient skin in the game might be tempted to not be as openly negative about proposals as they should be.

Funders instead can give you a strong signal - a signal which is unfortunately somewhat binary and lacks nuance. But someone being willing to fund something or not is a much stronger signal for the value of a proposal than comments from friends on a GoogleDoc. This is especially true if people proposing ideas don't take into account how hard it is to discourage people and don't interpret feedback in that light.

comment by John_Maxwell (John_Maxwell_IV) · 2019-03-22T05:22:28.741Z · score: 4 (3 votes) · EA(p) · GW(p)
I consider having people giving feedback to have 'skin in the game' to be important for the accuracy of the feedback. Most people don't enjoy discouraging others they have social ties with. Often reviewers without sufficient skin in the game might be tempted to not be as openly negative about proposals as they should be.

Maybe anonymity would be helpful here, the same way scientists do anonymous peer review?

comment by Jan_Kulveit · 2019-03-21T17:47:42.577Z · score: 3 (2 votes) · EA(p) · GW(p)

I'm not sure if we agree or disagree, possibly we partially agree, partially disagree. In case of negative feedback, I think as a funder, you are in greater risk of people over-updating in the direction "I should stop trying".

I agree friends and social neighbourhood may be too positive (that's why the proposed initial reviews are anonymous, and one of the reviewers is supposed to be negative).

When funders give general opinions on what should or should not get started or how you value or not value things, again, I think you are at greater risk of having too much of an influence on the community. I do not believe the knowledge of the funders is strictly better than the knowledge of grant applicants.

comment by Denise_Melchin · 2019-03-21T18:22:44.095Z · score: 12 (4 votes) · EA(p) · GW(p)

(I still feel like I don’t really understand where you’re coming from.)

I am concerned that your model of how idea proposals get evaluated (and then plausibly funded) is a bit off. From the original post:

hard to evaluate which project ideas are excellent , which are probably good, and which are too risky for their estimated return.

You are missing one major category here: projects which are simply bad because they do have approximately zero impact, but aren't particularly risky. I think this category is the largest of the the four.

Which projects have a chance of working and which don't is often pretty clear to people who have experience evaluating projects quite quickly (which is why Oli suggested 15min for the initial investigation above). It sounds to me a bit like your model of ideas which get proposed is that most of them are pretty valuable. I don't think this is the case.

When funders give general opinions on what should or should not get started or how you value or not value things, again, I think you are at greater risk of having too much of an influence on the community. I do not believe the knowledge of the funders is strictly better than the knowledge of grant applicants.

I am confused by this. Knowledge of what?

The role of funders/evaluators is to evaluate projects (and maybe propose some for others to do). To do this well they need to have a good mental map of what kind of projects have worked or not worked in the past, what good and bad signs are, ideally from an explicit feedback loop from funding projects and then seeing how the projects turn out. The role of grant applicants is to come up with some ideas they could execute. Do you disagree with this?

comment by Jan_Kulveit · 2019-03-21T18:49:51.172Z · score: 3 (2 votes) · EA(p) · GW(p)
You are missing one major category here: projects which are simply bad because they do have approximately zero impact, but aren't particularly risky. I think this category is the largest of the the four.

I agree that's likely. Please take the first paragraphs more as motivation than precise description of the categories.

Which projects have a chance of working and which don't is often pretty clear to people who have experience evaluating projects quite quickly (which is why Oli suggested 15min for the initial investigation above).

I think we are comparing apples and oranges. As far as the output should be some publicly understandable reasoning behind the judgement, I don't think this is doable in 15m.

It sounds to me a bit like your model of ideas which get proposed is that most of them are pretty valuable. I don't think this is the case.

I don't have strong prior on that.

To do this well they need to have a good mental map of what kind of projects have worked or not worked in the past,...

From a project-management perspective, yes, but with slow and bad feedback loops in long-term, x-risk and meta oriented projects, I don't think it is easy to tell what works and what does not. (Even with projects working in the sense they run smoothly and are producing some visible output.)

comment by Habryka · 2019-03-20T23:57:58.625Z · score: 24 (9 votes) · EA(p) · GW(p)
Peer review in a debate framework. Two referees will write evaluations, one focusing on the possible negatives, costs and problems of the proposal; and the other on the benefits. Both referees will also suggest what kind of resources a team attempting the project should have. (2-5h / analyst / project)

I thought about this stage a bunch since Jan shared me on the original draft, and I don't expect a system like this to provide useful evaluations.

When I have had good conversations about potential projects with people in EA, it's very rarely the case that the assessments of people could be easily summarized by two lists of the form "possible negative consequences" and "possible benefits". In practice I think people will have models that will output a net-positive impact or a net-negative impact, depending on certain facts that they have uncertainty about, and understanding those cruxes and uncertainties is the key thing in understanding whether a project will be worth working on. I tried for 15 minutes to write a list of just the negative consequences I expect a project to have, and failed to do so, because most of the potential negative consequences are entwined with the potential benefits of the project in a way that doesn't really make it easy to just focus on the negatives or the positives.

comment by John_Maxwell (John_Maxwell_IV) · 2019-03-21T22:51:22.613Z · score: 10 (4 votes) · EA(p) · GW(p)

This is true, but "possible negative consequences" and "possible benefits" are good brainstorming prompts. Especially if someone has a bias towards one or the other, telling them to use both prompts can help even things out.

comment by Habryka · 2019-03-22T02:41:37.252Z · score: 1 (1 votes) · EA(p) · GW(p)

I agree. I think having reviewers that use those as prompts for their evaluations and their feedback seems like a decent choice. But I wouldn't want to force one person to only use that one prompt.

comment by Jan_Kulveit · 2019-03-21T04:09:23.379Z · score: 9 (5 votes) · EA(p) · GW(p)

It is very easy to replace this stage with e.g. just two reviews.

Some of the arguments for the contradictory version

  • the point of this stage is not to produce EV estimate, but to map the space of costs, benefits, and considerations
  • it is easier to be biased in a defined way than unbiased
  • it removes part of the problem with social incentives

Some arguments against it are

  • such adversarial setups for truth-seeking are uncommon outside of judicial process
  • it may contribute to unnecessary polarization
  • the splitting may feel unnatural

comment by dpiepgrass · 2019-03-24T01:13:42.585Z · score: 3 (2 votes) · EA(p) · GW(p)

Based on Habryka's point, what if "stage 1b" allowed the two reviewers to come to their own conclusions according to their own biases, and then at the end, each reviewer is asked to give an initial impression as to whether it's fund-worthy (I suppose this means its EV is equal to or greater than typical GiveWell charity) or not (EV may be positive, but not high enough).

This impression doesn't need to be published to anyone if, as you say, the point of this stage is not to produce an EV estimate. But whenever both reviewers come to the same conclusion (whether positive or not), a third reviewer is asked to review it too, to potentially point out details the first reviewers missed.

Now, if all three reviewers give a thumbs down, I'm inclined to think ... the applicant should be notified and suggested to go back to the drawing board? If it's just two, well, maybe that's okay, maybe EV will be decidedly good upon closer analysis.

I think reviewers need to be able (and encouraged) to ask questions of the applicant, as applications are likely to be have some points that are fuzzy or hard to understand. It isn't just that some proposals are written by people with poor communication skills; I think this will be a particular problem with ambitious projects whose vision is hard to articulate. Perhaps the Q&As can be appended to the application when it becomes public? But personally, as an applicant, I would be very interested to edit the original proposal to clarify points at the location where they are first made.

And perhaps proposals will need to be rate-limited to discourage certain individuals from wasting too much reviewer time?

comment by agdfoster · 2019-03-22T16:48:20.787Z · score: 14 (6 votes) · EA(p) · GW(p)

I regularly simplify my evaluations into pros and cons lists and find them surprisingly good. Open Phil’s format essentially boils down into description, pros, cons, risks.

Check out kialo. It allows really nice nested pro con lists built of claims and supporting claims +discussion around those claims. It’s not perfect but we use it for keeping track of logic for our early stage evals / for taking a step back on evals we have gotten too in the weeds with

comment by Habryka · 2019-03-22T18:21:36.371Z · score: 13 (5 votes) · EA(p) · GW(p)

Do you have an example of an Open Phil report that boils down to that format? This would be an update for me. I tried my best to randomly sample reports from their website (by randomly clicking around), and the three I ended up looking at didn't seem to follow that structure at all:

comment by Ben Pace · 2019-03-22T22:49:39.293Z · score: 12 (4 votes) · EA(p) · GW(p)

I imagined Alex was talking about the grant reports, which are normally built around “case for the grant” and “risks”. Example: https://www.openphilanthropy.org/giving/grants/georgetown-university-center-security-and-emerging-technology

comment by Habryka · 2019-03-22T23:46:49.690Z · score: 4 (3 votes) · EA(p) · GW(p)

Ah, yes. I didn't end up clicking on any of those, but I agree, that does update me a bit.

I think the biggest crux that I have is that I expect that it would be almost impossible for one person to only write the "risk" section of that article, and a separate person to only write the "case" section of the article, since they are usually framed in contrast to one another. Presenting things in the form of pro/con is very different from generating things in the form of pro/con.

comment by Jan_Kulveit · 2019-03-26T00:54:09.963Z · score: 2 (1 votes) · EA(p) · GW(p)

My impression is you have in mind something different than what was intended in the proposal.

What I imagined was 'priming' the argument-mappers with prompts like

  • Imagine this projects fails. How?
  • Imagine this project works, but has some unintended bad consequences. What they are?
  • What would be a strong reason not to associate this project with the EA movement?

(and the opposites). When writing their texts the two people would be communicating and looking at the arguments from both sides.

The hope is this would produce more complete argument map. One way to think about it, is each person is 'responsible' for the pro/con section, trying to make sure it captures as much important considerations as possible.

It seems quite natural for people to think about arguments in this way, with "sides" (sometimes even single authors expose complex arguments in the "dialogue" way).

There are possible benefits - related to why 'debate' style is used in justice

  • It levels the playing field in interesting ways (when compared to public debate on the forum). In the public debate, what "counts" is not just arguments, but also discussion and social skills, status of participants, moods and emotions of the audience, and similar factor. The proposed format would mean both the positives and negatives have "advocates" ideally of "similar debate strength" (anonymous volunteer). This is very different from a public forum discussion, where all kinds of "elephant in the brain" biases may influence participants and bias judgements.
  • It removes some of the social costs and pains associated with project discussions. Idea authors may get discouraged by negative feedback, downvotes/karma, or similar.

Also, just looking at how discussions on the forum look now, it seems in practice it is easy for people to look at things from positive or negative perspectives: certainly I have seen arguments structured like (several different ways how something fails + why is it too costly if it succeeded + speculation what harm it may cause anyway).

Overall: in my words, I'm not sure whether your view is 'in the space of argument-mapping, noting in the vicinity of debate, will work - at least when done by humans and applied to real problems'. Or 'there are options in this space which are bad' - where I agree something like bullet-pointed lists of positives and negatives where the people writing them would not communicate seems bad.

comment by John_Maxwell (John_Maxwell_IV) · 2019-03-22T05:31:32.917Z · score: 11 (4 votes) · EA(p) · GW(p)

It seems like Jan is getting a lot of critical feedback, so I just want to say, big ups to you Jan for spearheading this. Perhaps it'd be useful to schedule a Skype call with Habryka, RyanCarey, or others to try & hash out points of disagreement.

The point of a pilot project is to gather information, but if information already exists in the heads of community members, a pilot could just be an expensive way of re-gathering that info. The ideal pilot might be something that is controversial among the most knowledgable people in the community, with some optimistic and some pessimistic, because that way we're gathering informative experimental data.

comment by Jan_Kulveit · 2019-03-20T23:26:35.661Z · score: 10 (4 votes) · EA(p) · GW(p)

To make the discussions more useful, I'll try to briefly recapitulate parts of the discussions and conversations I had about this topic in private or via comments in the draft version. (I'm often coalescing several views into more general claim)

There seems to be some disagreement about how rigorous and structured the evaluations should be - you can imagine a scale where on one side you have just unstructured discussion on the forum, and on the opposite side you have "due diligence", multiple evaluators writing detailed reviews, panel of forecasters, and so on.

My argument is: unstructured discussion on the forum is something we already have, and often the feedback project ideas get is just a few bits from voting, plus a few quick comments. Also the prevailing sentiment of comments is sometimes at odds with expert views or models likely used by funders, which may cause some bad surprises. That is too "light". The process proposed here is closer to the "heavy" end of the scale. My reason is it seems easier to tune the "rigour" parameter down than up, and trying it on a small batch has higher learning value.

Another possible point of discussion is whether the evaluation system would work better if it was tied to some source of funding. My general intuition is this would create more complex incentives, but generally I don't know and I'm looking for comments.

Some people expressed uncertainty if there is a need for such system. Some because they believe that there aren't many good project ideas or projects (especially unfunded ones). Others expressed uncertainty if there is a need for such system, because they feel proposed projects are almost all good, there are almost no dangerous project ideas, and even small funders can choose easily. I don't have good data, but I would hope having largely public evaluations could at least help everyone to be better calibrated. Also, when comparing the "EA startup ecosystem" with the normal startup ecosystem, it seems we are often lacking what is provided by lead investors, incubators or mentors.

comment by Habryka · 2019-03-20T23:48:43.158Z · score: 19 (8 votes) · EA(p) · GW(p)

(Here are some of the comments that I left on the draft version of this proposal that I was sent, split out over multiple comments to allow independent voting):

[Compared to an open setup where any reviewer can leave feedback on any project in an open setting like a forum thread] An individual reviewer and a board is much less likely to notice problems with a proposal, and a one-way publishing setup is much more likely to cause people to implement bad projects than a setup where people are actively trying to coordinate work on a proposal in a consolidated thread. In your setup the information is published in at least two locations, and the proposal is finalized and no longer subject to additional input from evaluators, which seems like it would very likely cause people to just take an idea and run with it, without consulting with others whether they are a good fit for the project.

Setting up an EA Forum thread with good moderation would take a lot less than 20 hours.

You are also planning to spend over 40 hours in evaluating projects for the first phase, which is quite costly, and you are talking about hiring people part-time. [...]

From my perspective, having this be in the open makes it a lot easier for me and other funders in the space to evaluate whether the process is going well, whether it is useful, or whether it is actively clogging up the EA funding and evaluation space. Doing this in distinct stages, and with most of the process being opaque, makes it much harder to figure out the costs of this, and the broader impact it has on the EA community, moving the expected value of this into the net-negative.

I also think that we already have far too many random EA websites that are trying to do a specialized thing, and not doing it super well. The whole thing being on a separate website will introduce a lot of trivial inconveniences into the process that could be avoided by just having all of it directly on the EA Forum.

To put this into perspective, you are requesting about a total of 75h - 160h of volunteer labor for just this first round of evaluations, not counting the input from the panel which will presumably have to include already very busy people who can judge proposals. That in itself is almost a full month of work, and you are depleting a limited resource of the goodwill of effective-altruists in doing this, and an even more limited resource of the most busy people to help with initiatives like this.

comment by Jan_Kulveit · 2019-03-21T00:07:57.412Z · score: 1 (2 votes) · EA(p) · GW(p)

As I've already explained in the draft, I'm still very confused by what

An individual reviewer and a board is much less likely to notice problems with a proposal than a broad discussion with many people contributing would ...

should imply for the proposal. Do you suggest that steps 1b. 1d. 1e. are useless or harmful, and having just the forum discussion is superior?

The time of evaluators is definitely definitely definitely not free, and if you treat them as free then you end up exactly in the kind of situation that everyone is complaining about. Please respect those people's time.

Generally I think this is quite strange misrepresentation of how I do value people's time and attention. Also I'm not sure if you assume the time people spend arguing on fora is basically free or does not count, because it is unstructured.

From my perspective, having this be in the open makes it a lot easier for me and other funders in the space to evaluate whether the process is going well, whether it is useful, or whether it is actively clogging up the EA funding and evaluation space. Doing this in distinct stages, and with most of the process being opaque, makes it much harder to figure out the costs of this, and the broader impact it has on the EA community, moving the expected value of this into the net-negative.

Generally almost all of the process is open, so I don't see what should be changed. If the complain is the process has stages instead of unstructured discussion, and this makes it less understandable for you, I don't see why.

comment by Habryka · 2019-03-21T00:44:25.932Z · score: 20 (6 votes) · EA(p) · GW(p)

My overall sense of this is that I can imagine this process working out, but the first round of this should ideally just be run by you and some friends of yours, and should not require 100+ hours of volunteer time. My expectation is that after you spend 10 hours trying to actually follow this process, with just one or two projects, on your own or with some friends, that you will find that large parts of it won't work as you expected and that the process you designed is a lot too rigid to produce useful evaluations.

comment by Habryka · 2019-03-21T00:33:35.664Z · score: 13 (4 votes) · EA(p) · GW(p)
As I've already explained in the draft, I'm still very confused by what [...] should imply for the proposal. Do you suggest that steps 1b. 1d. 1e. are useless or harmful, and having just the forum discussion is superior?

I am suggesting that they are probably mostly superfluous, but more importantly, I am suggesting that a process that tries to separate the public discussion into a single stage, that is timeboxed at only a week, will prevent most of the value of public discussion, because there will be value from repeated back and forth at multiple stages in this process, and in particular value from integrating the step of finding a team for a project with the process of evaluating a proposal.

To give you an example, I expect that someone will have an idea for a project that is somewhat complicated, and will write an application trying their best to explain it. I expect for the majority of projects the evaluators will misunderstand what the project is about (something I repeatedly experienced for project proposals on the LTF-Fund), and will then spend 2-5 hours writing a negative evaluation for a project that nobody thought was a good idea. The original person who proposed the project will then comment during the public discussion stage and try to clarify their idea, but since this process currently assigns most of the time for the evaluators and board members in the evaluation stage, there won't be any real way in which he can cause the evaluators to reevaluate the proposal, since the whole process is done in batches and the evaluators only have that many hours set aside (and they already spend 2-5 hours on writing an evaluation of the proposal).

On the other hand, if the evaluators are expected to instead participate mostly in a back-and-forth discussion over the course of a week, or maybe multiple weeks, then I think most likely the evaluators would comment with some initial negative impressions of the project which would probably be written in 5-10 minutes. The person writing the proposal would respond and clarify, and then the evaluator would ask multiple clarifying questions until they have a good sense of the proposal. Ideally, the person putting in the proposal would also be the person interested in working on it, and so this back-and-forth would also allow the evaluator to determine whether this person is a good fit for the project, and allow other people to volunteer their time to participate and help with the project. The thread itself would serve as the location for other people to find interesting projects to work on, and to get up to speed on who is working on what projects.

---

I also think that assigning two evaluators to each project is a lot worse than assigning evaluators in general and allowing them to chime in when they have pre-existing models for projects. I expect that if they don't have pre-existing models in the domain that a project is in, an evaluator will find it almost impossible to write anything useful about that project, without spending many hours building basic expertise in that domain. This again suggests a setup where you have an open pool of proposals, and a group of evaluators who freely choose which projects to comment on, instead of being assigned individual projects.

comment by Jan_Kulveit · 2019-03-21T01:12:16.424Z · score: 4 (3 votes) · EA(p) · GW(p)

I don't understand why you assume the proposal is intended as something very rigid, where e.g. if we find the proposed project is hard to understand, nobody would ask for clarification, or why you assume the 2-5h is some dogma. The back-and-forth exchange could also add to 2-5h.

With assigning two evaluators to each project you are just assuming the evaluators would have no say in what to work on, which is nowhere in the proposal.

Sorry but can you for a moment imagine also some good interpretation of the proposed schema, instead of just weak-manning every other paragraph?

comment by Habryka · 2019-03-21T01:39:20.902Z · score: 18 (7 votes) · EA(p) · GW(p)

I am sorry for appearing to be weak-manning you. I think you are trying to solve a bunch of important problems that I also think are really important to work on, which is probably why I care so much about solving them properly and have so many detailed opinions about how to solve them. While I do think we have strong differences in opinion on this specific proposal, we probably both agree on a really large fraction of important issues in this domain, and I don't want to discourage you from working in this domain, even if I do think this specific proposal is a bad idea.

Back to the object level: I think as I understand the process, the stages have to necessarily be very rigid because they require the coordination of 5+ volunteers, a board, and a set of researchers in the community, each of which will have a narrow set of responsibilities like writing a single evaluation or having meetings that need to happen at a specific point in time.

I think coordinating that number of people gives naturally rise to very rigid structures (I think even coordinating a group of 5 full-time staff is hard, and the amount of structure goes up drastically as individuals can spend less time), and your post explicitly says that step 1.c, is the step in which you expect back and forth with the person who proposed the project, making me think that you do not expect back and forth before that stage. And if you do expect back-and-forth before that stage, then I think it's important that you figure out a way to make that as easy as possible, and given the difficulty of coordinating large numbers of people, I think if you don't explicitly plan for making it easy, it won't happen and won't be easy.

comment by Jan_Kulveit · 2019-03-21T02:07:13.942Z · score: 3 (3 votes) · EA(p) · GW(p)

I don't see why continuous coordination of a team of about 6 people on slack would be very rigid, or why people would have very narrow responsibilities.

For the panel, having some defined meeting and evaluating several projects at once seems time and energy conserving, especially when compared to the same set of people watching the forum often, being manipulated by karma, being in a way forced to reply to many bad comments, etc.

comment by Habryka · 2019-03-21T00:38:58.234Z · score: 12 (3 votes) · EA(p) · GW(p)
Generally almost all of the process is open, so I don't see what should be changed. If the complain is the process has stages instead of unstructured discussion, and this makes it less understandable for you, I don't see why.

One part of the process that is not open is the way the evaluators are writing their proposals, which is as I understand it where the majority of person-time is being spent. It also seems that all the evaluations are going to be published in one big batch, making it so that feedback on the evaluation process would take until the complete next grant round to be acted on, which is presumably multiple months into the future.

The other process that is not open are these two stages:

1d. A panel will rate the proposal, utilizing the information gathered in phases b. and c., highlighting which part of the analysis they consider particularly important. (90m / project)
1e. In case of disagreement among the panel, the question will get escalated and discussed with some of the more senior people in the field.

I expect the time of the panel, as well as the time of the more senior people in the field are the most valuable resources that could be wasted by this process, and the current process gives very little insight into whether that time is well-spent or not. In a simple public forum setup, it would be easy to see whether the overall process is working, and whether the contributions of top people are making a significant difference.

comment by Jan_Kulveit · 2019-03-21T00:50:37.640Z · score: 2 (1 votes) · EA(p) · GW(p)

With the first part, I'm not sure what would you imagine as the alternative - having access to evaluators google drive so you can count how much time they spent writing? The time estimate is something like an estimate how much it can take for volunteer evaluators - if all you need is in the order of 5m you are either really fast or not explaining your decisions.

I expect much more time of experts will be wasted in forum discussions you propose.

comment by Habryka · 2019-03-21T01:27:02.907Z · score: 8 (4 votes) · EA(p) · GW(p)

I think in a forum discussion, it's relatively easy to see how much someone is participating in the discussion, and to get a sense of how much time they spent on stuff. I am not super confident that less time would be wasted in the forum discussions I am proposing, but I am confident that I and others would notice if lots of people's time was wasted, which is something I am not at all confident about for your proposal and which strongly limits the downside for the forum case.

comment by Jan_Kulveit · 2019-03-21T01:56:58.382Z · score: 3 (2 votes) · EA(p) · GW(p)

On the contrary: on slack, it is relatively easy to see the upper bound of attention spent. On the forum, you should look not on just the time spent to write comments, but also on the time and attention of people not posting. I would be quite interested how much time for example CEA+FHI+GPI employees spend reading the forum, in aggregate (I guess you can technically count this.)

comment by Habryka · 2019-03-21T02:04:28.057Z · score: 4 (3 votes) · EA(p) · GW(p)

*nods* I do agree that you, as the person organizing the project, will have some sense of how much time has been spent, but I think it won't be super easy for you to communicate that knowledge, and it won't by default help other people get better at estimating the time spent on things like this. It also requires everyone watching to trust you to accurately report those numbers, which I do think I do, but I don't think everyone necessarily has reason to.

I do think on Slack you also have to take into account the time of all the people not posting, and while I do think that there will be more time spent just reading and not writing on the EA Forum, I generally think the time spent reading is usually worth it for people individually (and importantly people are under no commitment to read things on the EA Forum, whereas the volunteers involved here would have a commitment to their role, making it more likely that it will turn out to be net-negative for them, though I recognize that there are some caveats where sometimes there are controversial topics that cause a lot of people to pay attention to make sure that nothing explodes).

comment by Habryka · 2019-03-21T00:41:19.059Z · score: 1 (1 votes) · EA(p) · GW(p)

Nevermind.

comment by Habryka · 2019-03-21T00:10:55.719Z · score: 15 (6 votes) · EA(p) · GW(p)

To respond more concretely to the "due diligence" vs. unstructured discussion section, which I think refers to some discussion me and Jan had on the Google doc he shared:

I think the thing I would like to see is something that is just a bit closer towards structured discussion than what we currently have on the forum. I think there doesn't currently exist anything like an "EA Forum Project discussion thread" and in particular not one that has any kind of process like

"One suggestion for a project per top-level comment. If you are interested in working on a project, I will edit the top-level post to reflect that you are interested in working on it. If you want to leave a comment anonymously, please use this form."

I think adding a tiny bit of process like this will cause there to be valuable discussion, will actually be better at causing good projects to be funded and for teams to start working on it, and is much less effort to set up than the process you are proposing here.

I am also worried that this process, even though it is already 7 stages long and involves at least 10 people, only covers less than half of the actual pipeline towards causing people to work on projects. I know that you want to explicitly separate the evaluation of projects from the evaluation of teams working on those project, but I don't think you can easily do that.

I think 90% of the time whether a project is good or bad depends on the team that wants to work on it, which is something that you strongly see reflected in the startup and investment world. It's extremely rare for a VC to fund or even evaluate a project without knowing what team is working on it, and I think you will find that any evaluation that doesn't include the part of matching up the teams with the projects will find that that part will quickly block any progress on this.

comment by Aaron Gertler (aarongertler) · 2019-03-21T07:10:22.397Z · score: 16 (7 votes) · EA(p) · GW(p)

I share Habryka's concern for the complexity of the project; each step clearly has a useful purpose, but it's still the case that adding more steps to a process will tend to make it harder to finish that process in a reasonable amount of time. I think this system could work, but I also like the idea of running a quick, informal test of a simpler system to see what happens.

Habryka, if you create the "discussion thread" you've referenced here, I will commit to leaving at least one comment on every project idea; this seems like a really good way to test the capabilities of the Forum as a place where projects can be evaluated.

(It would be nice if participants shared a Google Doc or something similar for each of their ideas, since leaving in-line comments is much better than writing a long comment with many different points, but I'm not sure about the best way to turn "comments on a doc" into something that's also visible on the Forum.)

comment by Habryka · 2019-03-21T22:03:28.521Z · score: 10 (3 votes) · EA(p) · GW(p)

I am currently quite busy, so only 50% on me finding the time to do it, but I will seriously look into making the time for this. I am also happy to chat to anyone else who wants to do this, and help out both with setting it up, and to participate in the thread.

comment by Jan_Kulveit · 2019-03-21T22:55:58.368Z · score: 7 (2 votes) · EA(p) · GW(p)

FWIW, part of my motivation for the design, was

1. there may be projects, mostly in long-term, x-risk, meta- and outreach spaces, which are very negative, but not in an obvious way

2. there may be ideas, mostly in long-term and x-risk, which are infohazard

The problem with 1. is most of the EV can be caused by just one project, with large negative impact, where the downside is not obvious to notice.

It seems to me standard startup thinking does not apply here, because startups generally can not go way bellow zero.

I also do not trust arbitrary set of forum users to handle this well.

Overall I believe the very lightweight unstructured processes are trading some gain in speed and convenience in most cases for some decreased robustness in worst cases.

In general I would feel much better if the simple system you want to try would avoid projects in long-term, x-risk, meta-, outreach, localization, and "searching for cause X" areas.

comment by Habryka · 2019-03-20T23:40:46.380Z · score: 13 (7 votes) · EA(p) · GW(p)

Here are some of the comments that I left on the draft version of this proposal that I was sent (split out over multiple comments to allow independent voting):

I continue to think that just having an open discussion thread, with reviewers participating in the discussion with optional private threads, will result in a lot more good than this.
Based on my experience with the LTF-Fund, I expect 90% of the time there will be one specific person who you need a 5 minute judgement from to judge a project, much more than you need a 2-5h evaluation. This makes an open setup where all evaluators can see all applications and provide input on the things they are particularly suited to contribute to a lot more valuable than an assignment process.
A simple invite-only, or fully open, discussion is also much easier to test than a more elaborate evaluation system, and I think you are overestimating the risk from infohazards and PR risk after some initial screening.
I do think it is important to allow reviewers to be completely anonymous when participating in the discussion.

After some more discussion:

My perspective is more that the simple intervention of:
"Create an EA Forum projects thread, with some incentive for people to leave reviews of projects"
should be tried before you do something as complicated as this. I agree that the resulting incentives can be messy, but I expect we will get a lot more data and information on what is important than we would by spending 20-50 hours of competent-person time on producing reviews plus setting up a vetting process, plus setting up a website, plus setting up a panel, plus setting up an infohazard policy before we try the obvious solution to the problem that takes 5 hours to implement.

[...]

I am pretty excited about someone just trying to create and moderate a good EA Forum thread, and it seems pretty plausible to me that the LTF fund would be open to putting something in the $20k ballpark into incentives for that
comment by Jan_Kulveit · 2019-03-21T00:52:55.462Z · score: 7 (3 votes) · EA(p) · GW(p)

I would be curious about you model why the open discussion we currently have does not work well - like here [EA · GW], where user nonzerosum [EA · GW] proposed a project, the post was heavily downvoted (at some point to negative karma) without substantial discussion of the problems. I don't think the fact that I read the post after three days and wrote some basic critical argument is a good evidence for an individual reviewer and a board is much less likely to notice problems with a proposal than a broad discussion with many people contributing would.

Also when you are making these two claims

Setting up an EA Forum thread with good moderation would take a lot less than 20 hours.

...

I am pretty excited about someone just trying to create and moderate a good EA Forum thread, and it seems pretty plausible to me that the LTF fund would be open to putting something in the $20k ballpark into incentives for that

at the same time I would guess it probably needs more explanation from you or other LTF managers.

Generally I'm in favour of solutions which are quite likely to work as opposed to solutions which look cheap but are IMO likely worse.

I also don't see how complex discussion on the forum with the high quality reviews you imagine would cost 5 hours. Unless, of course, the time and attention of the people who are posting and commenting on the forum does not count. If this is the case, I strongly disagree. The forum is actually quite costly in terms of time, attention, and also emotional impacts on people trying to participate.

comment by Habryka · 2019-03-21T01:23:00.598Z · score: 33 (10 votes) · EA(p) · GW(p)
I also don't see how complex discussion on the forum with the high quality reviews you imagine would cost 5 hours.

I think an initial version of the process, in which you plus maybe one or two close collaborators, would play the role of evaluators and participate in an EA Forum thread, would take less than 5 hours to set up and less than 15 hours of time to actually execute and write reviews on, and I think would give you significant evidence about what kind of evaluations will be valuable and what the current bottlenecks in this space are.

I would be curious about you model why the open discussion we currently have does not work well - like here [EA · GW], where user nonzerosum [EA · GW] proposed a project, the post was heavily downvoted (at some point to negative karma) without substantial discussion of the problems.

I think that post is actually a good example of why a multi-stage process like this will cause a lot of problems. I think the best thing for nonzerosum to do would have been to create a short comment or post, maybe two to three paragraphs, in which he explained the basic idea of a donor list. At this point, he would have not been super invested in it, and I think if he had posted only a short document, people would have reacted with openness and told him that there has been a pretty long history of people trying to make lots of EA donor coordination platforms, and that there are significant problems with unilateralist curse-like problems. I think the downvotes and negative reaction came primarily from people perceiving him to be prematurely charging ahead with a project.

I do think you need some additional incentive for people to actually write up their thoughts in addition to just voting on stuff, which is why a volunteer evaluator group, or maybe some kind of financial incentive, or maybe just some kind of modifications to the forum software (which I recognize is not something you can easily do but which I have affordances for), is a good idea. But I do think you want to be very hesitant to batch the reviews too much, because as I mentioned elsewhere in the thread, there is a lot of value from fast feedback loops in this evaluation process, as well as allowing experts in different domains to chime in with their thoughts.

And we did see exactly that. I think the best comment (next to yours) on that post is Ben West's comment and Aaron Gertler's comments that were both written relatively soon after the post was written (and I think would have been written even if you hadn't written yours) and concisely explained the problems with the proposal. I don't think a delay of 2-3 days is that bad, and overall I think nonzerosum successfully received the feedback that the project needed. I do think I would like to ensure that people proposing projects feel less punished by doing so, but I think that can easily be achieved by establishing a space in which there is common knowledge that a lot of proposals will be bad and have problems, and that a proposal being proposed in that space does not mean that everyone has to be scared that someone will rush ahead with that proposal and potentially cause a lot of damage.

If I understood your setup correctly, it would have potentially taken multiple weeks for nonzerosum to get feedback on their proposal, and the response would have come in the form of an evaluation that took multiple hours to write, which I don't think would have benefited anyone in this situation.

comment by Raemon · 2019-03-23T02:11:34.740Z · score: 9 (3 votes) · EA(p) · GW(p)

Mild formatting note: I found the introduction a bit long as well as mostly containing information I already knew.

I'm not sure how to navigate "accessible" vs "avoid wasting people's time". But I think you could have replaced the introduction with a couple bullet links, like:

....

Building off of:

 What to Do With [EA · GW] People? [EA · GW]

 Can EA copy Teach For [EA · GW] America? [EA · GW]

 EA is Vetting [EA · GW] Constrained [EA · GW]

The EA community has plenty of money and people, but is bottlenecked on a way to scalably evaluate new projects. I'd like to start a platform that:

  • provides feedback on early stage projects
  • estimates what resources would be necessary to start a given project
  • further in the project's life-cycle, evaluating the team and idea fit.

...

Or, something like that (I happen to like bullet points, but in any case seems like it should be possible to cut the opening few paragraphs into a few lines)

comment by oliverbramford · 2019-03-27T18:39:06.830Z · score: 7 (2 votes) · EA(p) · GW(p)

Meta-suggestion: In-person, professionally facilitated small workshop, sponsored and hosted by CEA, to build-consensus around a solution to the EA project bottleneck - with a view to CEA owning the project.

There are a range of carefully-considers, well-informed, and somewhat divergent perspectives on how to solve the EA project bottleneck. At the same time, getting the best version possible of a solution to the EA project bottleneck is likely to be very high value; a sub-optimal version may represent a large counterfactual loss of value.

As an important and complex piece of EA infrastructure, this seems to be a good fit for CEA to own. CEA is well-placed to lend legitimacy and seed funding to such a project, so that it has the ongoing human and financial resources and credibility to be done right.

It also seems quite likely that an appropriate - optimally impactful - solution to this problem would entail work beyond a project evaluation platform (e.g. a system to source project ideas from domain experts; effective measures to dampen overly-risk projects). This kind of 'scope creep' would be hard for an independent project to fulfil, but much easier for CEA to execute, given their network and authority.

comment by sudhanshu_kasewa · 2019-03-22T00:08:20.625Z · score: 6 (4 votes) · EA(p) · GW(p)

Reposting my message to Jan after reading a draft of this post. I feel like I’m *stepping into it* with this, but I’m keen to get some feedback of my own on the thoughts I’m expressing here.

Epistemic status: confident about some things, not about all of them.

Emotional state: Turbulent/conflicted, triggering what I label as fear-of-rejection as I don’t think this confirms to many unstated standards; overruled, because I must try and be prepared to fail, for I cannot always learn in a vaccuum.

Why piggy-back off this post: Because 1) it’s related, 2) my thoughts are only partially developed, 3) I did not spend longer than an hour on this, so it doesn’t really warrant its own post, but 4) I think it can add some more dimensions to the discussions at hand.

Please be kind in your feedback. I prefer directness to euphemism, and explanations in simpler words rather than in shibboleths.

Edited for typos and minor improvements.

-----------

Hey Jan!

Thanks for leading the charge on the EA project platform project. I agree with your motivation, that community members can benefit from frequent, high-resolution, nuanced feedback, on their ideas (project or otherwise). I've left some comments / edits on your doc, but feel free to ignore them.

Also feel free to ignore the rest of this, but I thought I'd put down a few thoughts anyway:

  • I think separating project evaluation from team evaluation is a good idea; however, doing a project well runs deeper than just being able to select the right team for the job. It's also about assigning/communicating the right responsibilities to the team members, along with how they can expect those to change over time as the project grows. Some people might be good at managing a project when it's a tiny team, but may not be cut out to be CEO when there are 20 (or maybe 50) people involved, and should be prepared to take on roles better suited to their gifts as a project grows. This, I feel, is a difficult message to communicate, especially to eager young effective altruists who can see that most of their community heroes/leaders are still in or barely out of their 20s; this seems like a form of survivor bias coupled with EA being a genuinely young movement populated with generally young people ; many who got into EA in their teens 5 or 10 years ago are now in positions of prestige, and this is (I feel) an aspirational but ultimately untenable dream for most of today's new EAs.
  • I think one of your reasons for having step 1b in place is to generate some substance around a project to encourage engagement for the community once it is released on a platform / forum. Unfortunately, it is also the bit that takes up the most amount of time from a very small pool of highly skilled volunteers, and is ultimately the same bottleneck faced by grant makers and analysts everywhere. If we find another form of promoting engagement, would this step be necessary?
  • Continuing from above, could the following system work? I know you said the design space of interventions is large, but here goes anyway:

1) On EA forum, set up a separate space where project ideas can be posted anonymously; this is phase 1. Basic forum moderation (for info hazards, extremely low quality content) applies here.

2a) Each new post starts out with the following tags: Needs Critics , Needs Steelmen , Needs Engagement

2b) Community members can then (anonymously) either critique, steelman, or otherwise review the proposal ; karma scores for the reviewer can be used to signal the weight of the review (and the karma scores can be bucketed to protect their anonymity)

2c) The community can also engage with the proposal, critiques, steelmen with up/down votes or karma or whatever other mechanism makes sense

2d) As certain \sigma (reviewer karma + multiplier*review net votes) thresholds are crossed, the proposal loses its Needs Critics / Needs Steelmen / Needs Engagement tags as appropriate. Not having any more of those tags means that the proposal is now ready to be evaluated to progress to a real project. A net of Steelman Score - Critic Score or something can signal its viability straight out the gate.

3a) Projects that successfully manage to lose all three tags automatically progress to phase 2, where projects solicit funding, mentorship, and team members. Once here, the proposer is deanonymised, and reviewers have the option to become deanonymised. The proposal is now tagged with Proposal Needs Revision, Needs Sponsors, Needs Mentorship, Needs Team Members

3b) The proposer revises the proposal in light of phase 1, and adds a budget, an estimate of initial team roles + specifics (skills / hours / location / rate). This can be commented on and revised, but can only be approved by (insert set of chosen experts here). Experts can also remove the proposal from this stage with a quick review, much like an Area Chair (e.g. by noting that the net score from phase 1 was too low, indicating the proposal requires significant rework). This is the only place where expert intervention is required, by which time the proposal has received a lot of feedback and has gone through some iterations. After an expert approves, the proposal now loses its Needs Proposal Revision tag, the proposal is for all intents frozen.

3c) Members with karma > some threshold can volunteer (unpaid) to be mentors, in the expectation that they provide guidance to the project of up to some X hours a week ; a project cannot lose this tag till it has acquired at least Y mentors.

3d) All members can choose to donate towards / sponsor a project via some charity vehicle (CEA?) under the condition that their money will be returned, or their pledge won't be due, until the required budget is not exceeded... I'm unfamiliar with how something like this can work in practice though.

3e) Members can (secretly) submit their candidature for the roles advertised in the proposal, along with CVs ; proposer, mentors, and high-value sponsors (for some specification of high-value) can vet, interview, select, and secure team members from this pool, and then a mentor can remove the Needs Team Members tag.

3f) Once a project loses all four tags in phase 2, it has everything it needs to launch. Team members are revealed, proposer (with mentor / CEA / local EA Chapter support) sets up the new organization, the charity vehicle transfers the money to the organization, that then officially hires the staff, and the party gets started.

4) Between mentors, high-value sponsors, and proposer, they submit some manner of updates on the progress of the project on the forum every Z months.

Essentially the above looks like an almost entirely open, community driven exercise, but it doesn't resolve how we get engagement (in steps 2b, 2c) in the first place. I don't have an answer to that question, but I think requesting the community to be Steelmen or Critics will signal that we need more than up/downvotes on this particular item. LessWrong has a very strong commenter base, so I suspect the denizens of the EA Forum could rise up to the task.

Of course, after having written this all up, I can see how implementing this on a platform might take a little more time. I'm not a web developer, so I'm not sure how easy it is to implement all the above logic flows (e.g. when is someone anonymous v/s when are they not?), but I estimate it might take a week (between process designers and the coders) to draw out (and iterate on) all the logic and pathways, two weeks to implement in code, and another week to test.

  • Sorry for all this, I hope it wasn't complete waste of your time. If I've overstayed my welcome in your mindspace, or overstepped in my role as 'someone random on the internet giving me feedback', I deeply apologise. Know that writing all this down was cathartic for me, so in the worst case, I did it selfishly, but not maliciously.

comment by Jan_Kulveit · 2019-03-22T00:30:37.082Z · score: 10 (3 votes) · EA(p) · GW(p)

Thanks Sundanshu! Sorry for not replying sooner, I was a bit overwhelmed by some of the negative feedback in the comments.

I don't think step 1b. has the same bottleneck as current grant evaluator face, because it is less dependent on good judgement.

With your proposal, I think part of it may work, I would be worried about other parts. With step 2b I would fear nobody would feel responsible for producing the content.

With 3a or any automatic steps like that, what does that lack is some sort of (reasonably) trusted expert judgement. In my view this is actually the most critical step in case of x-risk, long-term, meta-, and similarly difficult to evaluate proposals.

Overall

  • I'm sceptical the karma or similar automated system is good for tracking what is actually important here
  • I see some beauty in automation, but I don't see it applied here in the right places

comment by Jan_Kulveit · 2019-03-22T02:30:36.610Z · score: 4 (3 votes) · EA(p) · GW(p)

Summary impressions so far: object-level

  • It seems many would much prefer expediency in median project cases to robustness and safety in rare low frequency possibly large negative impact cases. I do not think this is the right approach, when the intention is also to evaluate long-term oriented, x-risk, meta-, cause-X, or highly ambitious projects.
  • I'm afraid there is some confusion about project failure modes. I'm more worried about projects which would be successful in having a team, working successfully in some sense, changing the world, but achieving large negative impact in the end.
  • I feel sad about the repeated claims the proposal is rigid, costly or large-scale. If something would not work in practice it could be easily changed. Spending something like 5h of time on a project idea which likely was result of much longer deliberation and which may lead to thousands hours of work seems reasonable. Paradoxically, just the discussion about whether the project is costly or not likely already had higher cost than what setting the whole proposed infrastructure for the project + phases 1a,1d,1c would cost.

Meta:

  • I will .not have time to participate in the discussion in next few days. Thanks for the comments so far.