I am the Principal Research Manager at Rethink Priorities working on, among other things, the EA Survey, Local Groups Survey, and a number of studies on moral psychology, focusing on animal, population ethics and moral weights.
In my academic work, I'm a Research Fellow working on a project on 'epistemic insight' (mixing philosophy, empirical study and policy work) and moral psychology studies, mostly concerned either with effective altruism or metaethics.
I've previously worked for Charity Science in a number of roles and was formerly a trustee of EA London.
I think it depends a lot on the specifics of your survey design. The most commonly discussed tradeoff in the literature is probably that having more questions per page, as opposed to more pages with fewer questions, leads to higher non-response and lower self-reported satisfaction, but people answer the former more quickly. But how to navigate this tradeoff is very context-dependent.
All in all, the optimal number of items per screen requires a trade-off: More items per screen shorten survey time but reduce data quality (item nonresponse) and respondent satisfaction (with potential consequences for motivation and cooperation in future surveys).Because the negative effects of more items per screen mainly arise when scrolling is required, we are inclined to recommend placing four to ten items on a single screen, avoiding the necessity to scroll.
In this context, survey researchers have to make informed decisions regarding which approach to use in different situations. Thus, they have to counterbalance the potential time savings and ease of application with the quality of the answers and the satisfaction of respondents. Additionally, they have to consider how other characteristics of the questions can influence this trade-off. For example, it would be expected that an increase in answer categories would lead to a considerable decrease in data quality, as the matrix becomes larger and harder to complete. As such, in addition to knowing which approach leads to better results, it is essential to know how characteristics of the questions, such as the number of categories or the device used, influence the trade-off between the use of grids and single-item questions.
But, in addition, I think there are a lot of other contextual factors that influence which is preferable. For example, if you want respondents to answer a number of questions pertaining to a number of subtly different prompts (which is pretty common in studies with a within-subjects component), then having all the questions for one prompt on one page may help make salient the distinction between the different prompts. There are other things you can do to aid this, like having gap pages between different prompts, though these can really enrage respondents.
I think all of the following (and more) are possible risks:
- People are tired/bored and so answer less effortfully/more quickly
- People are annoyed and so answer in a qualitatively different way
- People are tired/bored/annoyed and so skip more questions
- People are tired/bored/annoyed and dropout entirely
Note that people skipping questions/dropping out is not merely a matter of quantity (reduced numbers of responses), because the dropout/skipping is likely to be differential. The effect of the questions will be to lead to precisely those respondents who are more likely to be bored/tired/annoyed by those questions and to skip questions/dropout if bored/tired/annoyed to be less likely to give responses.
Regrettably, I think that specifying extremely clearly that the questions are completely optional influences some respondents (it also likely makes many simply less likely to answer these questions), but doesn't ameliorate the harm for others. You may be surprised how many people will provide multiple exceptionally long open comments and then complain that the survey took them longer than the projected average. That aside, depending on the context, I think it's sometimes legitimate for people to be annoyed by the presence of lots of open comment questions even if they are explicitly stated to be optional because, in context, it may seem like they need to answer them anyway.
Thanks for the post. I think most of this is useful advice.
"Walkthroughs" are a good way to improve the questions
In the academic literature, these are also referred to as "cognitive interviews" (not to be confused with this use) and I generally recommend them when developing novel survey instruments. Readers could find out more about them here.
Testers are good at identifying flaws, but bad at proposing improvements... I'm told that this mirrors common wisdom in UI/UX design: that beta testers are good at spotting areas for improvement, but bad (or overconfident) at suggesting concrete changes.
This is also conventional understanding in academia. Though there are some, mostly qualitative-oriented, philosophies that focus more on letting participants define articulation of the research output, there's generally no reason to think that respondents should be able to describe how a question should be asked (although, of course, if you are pretesting anyway, there is little reason not to consider suggestions). Depending on what you are measuring, respondents may not even be aware of what underlying construct (not necessarily something they even have a concept for) an item is trying to measure. Indeed, people may not even be able to accurately report on their own cognitive processes. Individuals' implicit understanding may outpace their ability to explicitly theoretically understand the issue at hand (for example, people can often spot incorrect grammar, or a misapplied concept, but not provide explicit accounts of the rules governing the thing in question).
Include relatively more "free-form" questions, or do interviews instead of a survey...
In interviews, you can be more certain that you and the respondent are thinking about questions in the same way (especially useful if your survey must deal with vague concepts)...
In interviews, you can get a more granular understanding of participants’ responses if desired, e.g. understanding relevant aspects of their worldviews, and choose to delve deeper into certain important aspects.
I agree there are some very significant advantages to the use of more qualitative instruments such as open-comments or interviews (I provide similar arguments here). In some cases these might be so extreme that it only makes sense to use these methods. That said, the disadvantages are potentially severe, so I would recommend against people being too eager to either switch to fully qualitative methods or add more open comment instruments to a mixed survey:
Open comment responses may greatly reduce comparability (and so the ability to aggregate responses at all, if that is one of your goals), because respondents may be functionally providing answers to different questions, employing different concepts
Analysing such data typically raises a lot of issues of subjectivity and researcher degrees of freedom
You can attempt to overcome those issues by pre-registering even qualitative research (see here or here), and by following a fixed protocol in advance using a more objective method to analyse and aggregate responses, but then this reintroduces the original issues of needing to force individuals' responses into fixed boxes when they may have been thinking of things in a different manner.
Including both fixed response and open comment things in the same format may seem like the best of both worlds and is often the best approach, but open comments questions are often dramatically more time-consuming and demanding than fixed response questions and so their inclusion can greatly reduce the quality of the responses to the fixed questions.
I think running separate qualitative and quantitative studies is worth seriously considering: either with initial qualitative work helping to develop hypotheses followed by quantitative study or with a wider quantitative study followed by qualitative work to delve further into the details. This can also be combined with separate exploratory and confirmatory stages of research, which is often recommended.
This latter point relates to the issue of preregistration, which you mention. It is common not to preregister analyses for exploratory research (where you don't have existing hypotheses which you want to test and simply want to explore or describe possible patterns in the data) - though some argue you should preregister exploratory research anyway. I think there's a plausible argument for erring on the side of preregistration in theory, based on the fact that preregistration allows reporting additional exploratory analyses anyway, or explicitly deviating from your preregistered analysis to run things differently if the data requires it (which sometimes it does if certain assumptions are not met). That said, it is quite possible for researchers to inappropriately preregister exploratory research and not deviate or report additional analyses, even where this means the analyses they are reporting are inappropriate and completely meaningless, so this is a pitfall worth bearing in mind and trying to avoid.
Ideally, I'd want to stress test this methodology by collecting ~10 responses and running it - probably you could just simulate this by going through the survey 10 times, wearing different hats.
Another option would be to literally simulate your data (you could simulate data that either does or does not match your hypotheses, for example) and analysing that. This is potentially pretty straightforward depending on the kind of data structure you anticipate.
Incidentally, I agreed with almost all the advice in this post except for the things in the "Other small lessons about survey design." In particular, I think "Almost always have just one question per page" and not using sliding scales rather than lists, seem like things I would not generally recommend (although having one question per page and using lists rather than scales is often fine). For "screening" questions for competence, unless you literally want to screen people out from taking the survey at all, you might also want to consider running these at the end of the survey rather than the beginning. Rather than excluding people from the survey entirely and not gathering their responses at all, you could gather their data , and then conduct analyses excluding respondents who fail the relevant checks, if appropriate (whether it's better to gather their data at all or not depends a lot on the specific case). Which order is better is a tricky question, depending on the specific case. One reason to have such questions later is that respondents can be annoyed by checks which seem like they are trying to test them (this most commonly comes up with comprehension/attention/instructional manipulation checks), which can influence later questions (the DVs you are usually interested in). Of course, in some circumstances, you may be concerned that the main questions will themselves influence responses to your 'check' in a way that would invalidate them.
For what it's worth, I'm generally happy to offer comments on surveys people are running, and although I can't speak for them, I imagine that would go for my colleagues on the survey team at Rethink Priorities too.
Would it be helpful to put some or all of the survey data on a data visualisation software like google data studio or similar? This would allow regional leaders to quickly understand their country/city data and track trends. It might also save time by reducing the need to do so many summary posts every year and provide new graphs on request.
We are thinking about putting a lot more analyses on the public bookdown next year, rather than in the summaries, which might serve some of this function. As you'll be aware, it's not that difficult to generate the same analysis for each specific country.
A platform that would allow more specific customisation of the analyses (e.g. breakdowns by city and gender and age etc.) would be less straightforward, since we'd need to ensure that no analyses could be sensitive or de-anonymising.
Unfortunately, we committed to not share any individual data available (except to CEA, if respondents opted in to do that). We're still happy to receive requests from people, such as yourself, who would like to see additional aggregate analyses (though the caveat about them not being potentially de-anonymising still applies, which is particularly an issue where people want analyses looking at a particular geographic area with a small number of EAs).
~15 months isn't necessary a target for the future. I think we could actually increase the gap to ~1.5 years going forward. But yes, the reasons for that would be to get the best balance between getting more repeated measurements (which increases the power, loosely speaking, of our estimates), being able to capture meaningful trends (looking at cross-year data, most things don't seem to change dramatically in the course of only 12 months), and reducing survey fatigue. That said, whatever the average frequency of the survey going forward, I expect there to be some variation as we shuffle things around to fit other organisations' timelines and to not clash with other surveys (like the EA Groups Survey) and so on.
Thanks for the question. We're planning to release the next EA Survey sometime in the middle of 2022. Historically, the average length of time between EA Surveys has been ~15 months, rather than every 12 months, and last year's survey was run right at the end of the year, so there won't be a survey within 2021 (the last time this happened was 2016).
That makes sense. Reference numbers even for things like race is surprisingly tricky. We've previously considered comparing the percentages for race within the EA Survey to baseline percentages. But although this works passably well for the US (EAS respondents are more white) and UK (EAS respondents are less white)- without taking into account the fact that EAS respondents are disproportionately rich, highly educated and young and therefore should not be expected to represent the composition of the general population- for many other major countries there simple isn't national data on race/ethnicity that matches the same categories as the US/UK. I think people should generally be a lot more uncertain when estimating how far the EA community is representative in this sense. The figures still allow comparison within the EA community though.
Here are the countries with the highest EAs per capita. Note that Iceland, Luxembourg and Cyprus, nevertheless have very low numbers of EA (<5) respondents. This graph doesn't leave out any countries with particularly high numbers of EAs, in absolute terms, though Poland and China are missing despite having >10.
We have reported this previously in both EAS 2018 and EAS 2019. We didn't report it this year because the per capita numbers are pretty noisy (at least among the locations with the highest EAs per capita, which tend to be low population countries). But it would be pretty easy to reproduce this analysis using this year's data.
To get another reference point I coded the "High Standards" comments and found that 75% did not seem to be about "perceived attitudes towards others." Many comments explicitly disavowed the idea that that they think EAs look down on others, for example, but still reported that they feel bad because of demandingness considerations or because 'everyone in the community is so talented' etc.
Not sure about the jump from 2014 to 2015, I'd expect some combination of broader outreach of GWWC, maybe some technical issues with the survey data (?) and more awareness of there being an EA Survey in the first place?
I think the total number of participants for the first EA Survey (EAS 2014) are basically not comparable to the later EA Surveys. It could be that higher awareness in 2015 than 2014 drives part of this, but there was definitely less distribution for EAS2014 (it wasn't shared at all by some major orgs). Whenever I am comparing numbers across surveys, I basically don't look at EAS 2014 (which was also substantially different in terms of content).
The highest comparability between surveys is for EAS 2018, 2019 and 2020.
I was surprised that the overall numbers of responses has not changed significantly from 2015-2017. Perhaps it could be explained by the fact that there was no Survey taken in 2016?
Appearances here are somewhat misleading, because although there was no EA Survey run in 2016, there was actually a similar amount of time in between EAS 2015 and EAS 2017 as any of the other EA Surveys (~15 months). But I do think it's possible that the appearance of skipping a year reduced turnout in EAS 2017.
I was going to try and compare the survey response to the estimated community size since 2014-2015, but realised that there don't seem to be any population estimates aside from the 2019 EA Survey. Are estimates on population size in earlier years?
We've only attempted this kind of model for EAS 2019 and EAS 2020. To use similar methods for earlier years, we'd need similar historical data to use as a benchmark. EA Forum data from back then may be available, but it may not be comparable in terms of the fraction of the population it's serving as a benchmark for. Back in 2015, the EA Forum was much more 'niche' than it is now (~16% of respondents were members), so we'd be basing our estimates on a niche subgroup, rather than a proxy for highly engaged EAs more broadly.
I do think there are some similarities between all these points that I'd maybe categorise under "elitist" (although I don't want to because I think that term has different connotations for people). But perhaps something like "EAs are perceived as being better than non-EAs" an this is expressed as the items I mentioned.
I think there's something of a family resemblance, but that it still wouldn't be possible to categorise them all as one thing. For example, I don't think disliking "high standards", necessarily entails disliking a "perceived attitude towards others", or necessarily even thinking that anyone has any particular attitude towards others. I would think it's difficult/impossible to reliable tease these apart without access to the specific responses (which is unfortunately impossible, since we don't have permission to share any of people's qualitative responses).
If there was sufficient interest we could analyse this with more of a qualitative network approach, which can identify these clusters, but as you can imagine it's relatively time-intensive to do.
it seems like this could be read as a negative (e.g. people don't feel welcome by the existing community, while the latter sounds quite positive - people are happy with the way the community influences and want more of it?
A lot of the ratings/comments were ambivalent in this way. This was in response to the question "Why did you give the two ratings [1-10] above?" rather than something like "Why did you give a positive/negative rating?" A lot of comments, were of the form "The community is great, but it should do more..."
Mean satisfaction ratings for people within the "More community/influence" category were 7.255 and mean satisfaction ratings overall were 7.259 (i.e. very similar).
Recategorising comments into superordinate categories based on the category they were assigned to is inherently going to be a bit questionable. Even if an item best fits category A, and category A (considered in the abstract) broadly seems like it fits in superordinate category 1, it doesn't follow that the item best fits in superordinate category 1 rather than one of the other superordinate categories.
I'm not sure I would categorise many of the items in "Elitism... Exclusive, High standards, Hubris, Dismissive" etc. as "Behaviour or attitude towards others." At least some of these may be based more on a general impression of the community (e.g. seeing many/most of the major figures seem to be from Oxbridge) rather than on actual behaviours by individuals. Likewise the appearance of very "high standards" may be only very tangentially related to specific behaviours.
I also definitely wouldn't round off comments about AI or animal welfare as "lack of interest in EA cause areas." If people complain about the communitybeing too focused on AI to the exclusion of other cause areas, it's often because they are very interested in the other EA cause areas.
For "new to EA" and "peripheral": people would often say things like "I'm new to EA, so I don't really know" or "I'm only peripherally engaged with the community, so I don't really know" to explain their ratings.
"More community/influence" captured comments saying they wanted to EA community to do more, particularly involving becoming more of a community or a larger community or influencing people more.
"Politics" included responses saying that EA was too woke/left and too capitalist (roughly twice as many in the former camp as the latter, but these are very small numbers so that ratio is inexact), and a very small number of mentions of too much politics, too little politics.
For this question, people could mention any number of things in principle, i.e. they could write literally anything they wanted, but each response was only coded as representing a single category that was thought to best reflect that comment.
... we seem to have surveyed a lot of people who were meaningfully affected by influences before mid-2017; on average, the people we surveyed say they first heard about EA/EA-adjacent ideas in 2015.
So I think there’s often a delay of 2-4 years between when a survey respondent first hears about EA/EA-adjacent ideas and when they start engaging in the kind of way that could lead our advisors to recommend them.
I think this is what one would predict given what we've reported previously about the relationship between time since joining EA and engagement (see 2019 as well).
To put this in more concrete terms: only about 25% of highly engaged (5/5) EAs in our sample joined after 2017. That may even be a low bar relative to the population you are targeting (or at least somewhat orthogonal to it). Looking at EA org employees (which might be a closer proxy), only 20% joined after 2017. See below:
Our respondents also look about 2x more likely than EA Survey respondents to have been introduced via a class at school, though I’m not sure how much of this is noise. (4% of our respondents gave this answer, vs. 2% for the EA Survey.)
For reference, I think the 95% CI for your figures would be about 2.1-7.7%, and 1.3-2.5% for the EAS.
Doing a quick eyeball of the two [OP and EAS] charts, they look pretty similar insofar as they’re comparable. “Peter Singer’s work” doesn’t appear, but it’s because they didn’t have that category — that would fall under “Book, article, or blog post” or “TED talk.” “Personal contact” leads in both, though by somewhat more in ours.
Notably, if we look at the EAS open comment data which I mentioned here, we can see that when Peter Singer references are counted as a single category, he's among the top-mentioned. 80,000 Hours don't appear among the top mentions in this analysis, most likely because they were a fixed category and respondents didn't feel the need to select 80,000 Hours and then write "80,000 Hours" in the further details box (personal contacts and EA groups often appeared in the further details of other categories, in contrast).
We don’t do significance testing or much other statistical analysis in this analysis... Our sense was that this approach was the best one for our dataset, where the n on any question was at maximum 217 (the number of respondents) and often lower, though we’re open to suggestions about ways to apply statistical analysis that might help us learn more.
Because you have so much within-subjects data (i.e. multiple datapoints from the same respondent), you will actually be much better powered than you might expect with ~200 respondents. For example, if you asked each respondent to rate 10 things (and for some questions you actually asked them to rate many more) you'd have 2000 datapoints and be better able to account for individual differences.
You might, separately, be concerned about the small sample size meaning that your sample is not representative of the population you are interested in: but as you observe here, it looks like you actually managed to sample a very large proportion of the population you were interested in (it was just a small population).
CEA defines an engaged EA as someone who takes significant actions motivated by EA principles (we sometimes also use the term “impartially altruistic, truth-seeking principles). In practice, this can look like selecting a job or degree program, donating a substantial portion of one’s income, working on EA-related projects, and so on. [italics added]
Just to clarify my understanding, are you defining/taking yourself to be looking at "highly engaged" EAs or just "engaged" EAs? For reference, the criteria for the CEA definition of "engaged EA" above are met by over half the EA Survey sample (more like 2/3rds depending on how you defined a "substantial" portion of one's income). In contrast, EA org employees/CBG recipients is a much higher bar for engagement (~10% of EA Survey are current EA org employees), while attending EAG is ~30% of EA Survey respondents, so I'd expect very different retention rates for these different populations.
We first checked if individuals were still supported by a Community Building Grant or working at an EA-related organization. If they were, we marked them as still being highly engaged. If they were not, we checked what they were currently doing and whether they still fulfilled the definition of being highly engaged. (These checks either involved looking at data from CEA’s various programs, LinkedIn, talking to people who knew about the individual’s activities, or the individual directly.)
Were you able to get definitive answers about all the individuals on your lists using these methods? If not, I'm curious what numbers the percentages were drawn from (e.g. just the people you could get a definitive answer about or were unclear cases assumed to have dropped out/not dropped out etc.?).
29.8% is much closer to the annual retention estimate produced by Peter Wildeford based on the 2018 EA Survey.
Note that in the comments on Ben's earlier post, Peter suggests that the other method that we used in that post would be more accurate (which gives an estimated ~60% retention after 4-5 years). We should be able to improve on that estimate quite a bit now that we have more cross-year data.
Do you have more information about how personal/family finance as a bottleneck for impact is to be understood?
Unfortunately, the majority of people's open comments didn't provide more detail beyond something like "financial constraints" or "low income." Among the minority of comments which did offer more detail, the specific thing most often mentioned was simply that people could donate more if they had more money. Freedom to explore different options, switch career, spend more time on high-impact work, and stress related to money were all mentioned by ~ a couple of people.
Yet, if Dominic Cummings’ word is anything to go by, the UK government still has a long way to go in terms of long-term policymaking.
Apologies if you already linked to this and I missed it, but Dominic Cummings is also writing a series about Singapore right now: https://dominiccummings.substack.com/p/high-performance-startup-government
I think the figures for highly engaged EAs working in Mental Health, drawn from EA Survey data, will be somewhat inflated by people who are working in mental health, but not in an EA-relevant sense e.g. as a psychologist. This is less of a concern for more distinctively EA cause areas of course.
Among people who, in EAS 2019, said they were currently working for an EA org, the normalised figures were only ~5% for Mental Health and ~2% for Climate Change (which, interestingly, is a bit closer to Ben's overall estimates for the resources going to those areas). Also, as Ben noted, people could select multiple causes, and although the 'normalisation' accounts for this, it doesn't change the fact that these figures might include respondents who aren't solely working on Mental Health or Climate Change, but could be generalists whose work somewhat involved considering these areas.
If it seems worth it (i.e., more people than me care!), you could potentially add a closed ended 'other potential cause areas' item. These options could be generated from the most popular options in the prior year's open ended responses. E.g., you could have IIDM and S-risk as close ended 'other options' for that question next year
Yeh that seems like it could be useful. It's useful to know what kinds of things people find valuable, because space in the survey is always very tight.
I agree it's quite possible that part of this observed positive association between engagement and longtermism (and meta) and negative association with neartermism is driven by people who are less sympathetic to longtermism leaving the community. There is some evidence that this is a factor in general. In our 2019 Community Information post, we reported that differing cause preferences were the second most commonly cited reason for respondents' level of interest in EA decreasing over the last 12 months. This was also among the most commonly cited factors in our, as yet unpublished, 2020 data. There is also some evidence from that, as yet unpublished, post that support for longtermism is associated with higher satisfaction with the EA community, though I think that relationship still requires more research.
Dealing with differential attrition (i.e. different groups dropping out of the survey/community at different rates) is a perennial problem. We may be able to tackle this more as we get more data tracked across years (anything to do with engagement is very limited right now, as we only have two years of engagement data). One possible route is that, in 2019, we asked respondents about whether they had changed cause prioritisation since they joined the community and if so which causes they switched from. A majority of those that had switched did so from Global Poverty (57%) and most seem to be switching into prioritising the Long Term Future. It may be possible to estimate what proportion of neartermist respondents should be expected to switch to longtermism across time (assuming no dropout), then compare that with actual changes in the percentage of neartermists across time and see whether we observe fewer neartermists within cohorts across time (i.e. across surveys) than we'd expect given the estimated conversion rate. But there are lots of complexities here, some of which we discuss in more detail in later posts on satisfaction and engagement.
A couple of perhaps weakly suggestive observations are that, within 2020 data, i) engagement is more clearly associated with cause prioritisation than time in EA and ii) we also observe more engaged EAs to be more longtermist (or meta) and less neartermist even within cohorts (i.e. EAs who reported joining within the same year). Looking within different engagement levels, below, the relationship cause prioritisation across time in EA is comparatively flat (an interesting exception being neartermism among those reporting highest engagement, where it drops dramatically among the most recent cohorts (2016-2020), i.e. those who have been in EA a longer are visibly less neartermist, which is roughly the pattern I would expect to see were neartermists dropping out [though it would be odd if that was only occurring among the most engaged]).
I can't speak for others, but I don't think there's any specific theoretical conception of the categories beyond the formal specification of the categories (EA movement building and Meta (other than EA movement building). Other people might have different substantive views about what does or does not count as EA movement building, specifically.
I think the pattern of results this year, when we split out these options, suggests that most respondents understood our historical "Meta" category to primarily refer to EA movement building. As noted, EA movement building received much higher support this year than non-EA movement meta; EA movement building also received similar levels of support to "Meta" in previous years; EA movement building and Meta (other than EA movement building) were quite well correlated, but only ~12% of respondents rated Meta (other than EA movement building) higher than EA movement building (44% rated movement building higher, and 43% rated them exactly the same).
I think this suggests either that we could have just kept the Meta category as in previous years or that in future years we could consider dropping Meta other than movement building as a category (though, in general, it is strongly preferable not to change categories across years).
In future, I'd like to see changes in the 'other causes' over time and across engagement level, if possible. For instance, it would be interesting to see if causes such as IIDM or S-risk are becoming more or less popular over time, or are mainly being suggested by new or experienced EAs.
Yeh, I agree that would be interesting. Unfortunately, if we were basing it on open comment "Other" responses, it would be extremely noisy due to low n, as well as some subjectivity in identifying categories. (Fwiw, it seemed like people mentioning S-risk were almost exclusively high engagement, which is basically what I'd expect, since I think it requires some significant level of engagement before people would usually be exposed to these ideas).
I think that it would be very interesting if we could compare the EA communities results on this survey against a sample of 'people who don't identify as EAs' and people who identify as being in one or more 'activist groups' (e.g., vegan/climate etc) and explore the extent of our similarities and differences in values (and how these are changing over time).
I agree this would be interesting. I'm particularly interested in examing differences in attitudes between EA and non-EA audiences. Examining differences in cause ratings directly might be more challenging due to a conceptual gap between EA understanding of certain causes and the general population (who may not even be familiar with what some of these terms mean). I think surveying more general populations on their support for different things (e.g. long-termist interventions, suitably explained) and observing changes in these across would be valuable though. Another way to examine differences in cause prioritisation would be to look at differences in the charitable portolios of the EA community vs wider donors, since that aggregate data is more widely available.
The context here was that we've always asked about "Meta" since the first surveys, but this year an org was extremely keen that we ask explicitly about "EA movement building" and separate out Meta which was not movement building.
In future years, we could well move back to just asking about Meta, or just ask about movement building, given that non-EA movement building meta both received relatively low support and was fairly well correlated with movement building.
As is suggested by this report, even donors who are very proactive, are often barely reflecting about where they should give at all. They are also, often, thinking about the charity sector in terms of very coarse-grained categories (e.g. my country/international charities, people/animal charities). On the other hand, they often are making sense of their donations in terms of causes and an implicit hierarchy of causes (including particular, personal commitments, such as to heart disease because a family member died from that, and so on). They also view charitable donation as highly personal and subjective (e.g. a matter of personal choice) [there is some evidence for this in here and unpublished work by me and my academic colleagues].
I think the overall picture this suggests is that people are sometimes thinking in terms of causes, but rarely explicitly deliberating about the optimal cause or set of causes.
To address the original question: I think this suggests that trying to get people to "change causes" by giving them reasons as to why certain causes are best may be ineffective in most cases, as people rarely deliberate about what cause is best and may not even be aiming to select the best cause. On the other hand, as many donors give fairly promiscuously or indiscriminately to charities across different cause areas, it's plausible you could get them to support different causes just by making them salient and appealing.
Incidentally your comment just now prompted me to look at the cross-year cross-cohort data for this. Here we can see that in EAS 2019, there was a peak in podcast recruitment closer to 2016 (based on when people in EAS 2019 reported getting involved in EA). Comparing EAS 2019 to EAS 2020 data, we can see signs of dropoff among podcast recruits among those who joined ~2014-2017 (and we can also see the big spike in 2020).
These are most instructive when compared to the figures for other recruiters (since the percentage of a cohort recruited by a given source is inherently a share relative to other recruiters, i.e. if one percentage drops between EAS 2019 and EAS 2020 another's has to go up).
Comparing personal contact recruits we can see steadier figures across EAS 2019 and EAS 2010, suggesting less dropoff. (Note that the figures for the earliest cohorts are very noisy since there are small numbers of respondents from those cohorts in these surveys).
This is also reflected very clearly in EA Survey data.
Here's the breakdown of which specific podcasts people cited in EAS 2020, for where they first heard about EA.
You can also get a sense of the magnitude of Sam Harris' podcast compared to other things like Doing Good Better from looking at the total number of mentions across response categories. (Respondents were asked to first indicate where they first heard about EA from a list of broad categories like 'Book', 'Podcast', and then asked to provide further details (e.g. what book or podcast) in an open comment. Only 60% of respondents to the first question gave further details so the numbers are commensurately lower.)
Taking these numbers at face value, Sam Harris seems to represent more than twice the recruitment effect of Doing Good Better, and slightly higher than half as much as Peter Singer.
One good reason not to take these numbers at face value is that they will be influenced by how recently these factors were recruiting people. We see consistent signs of attrition across cohorts, so a factor which recruits people in 2020 will have a lot more of those people left in the sample than a factor which recruited a lot of people in 2015 (of whom probably >60% have dropped out by 2020).
People who first got involved at 18 (or 19) are about the same as people who got involved at 21 (i.e. a little bit lower than the peak at 20).
People who first got involved at 17 are about the same as people who first got involved 22-23.
For people who first got involved 15 or 16, the confidence intervals are getting pretty wide, because fewer respondents joined at these ages, but they're each a little less engaged, being most similar to those who first got involved in their mid-late 20s or 30s respectively.
In short, the trend is pretty smooth both before and after 20, but mid to late 30s it seems to level out a bit, temporarily.
You might want to open these images in new windows to see them full size.
And finally, this is visually messy, but split by cohort, which could confound things otherwise.
We'll be presenting analyses of this using EAS2020 data in the Engagement post shortly.
We show changes in the proportion of respondents coming from each source across cohorts using this year's data here.
You can see the increase in absolute numbers coming from Podcasts and the % of each cohorts coming from Podcasts below. Because some portion of each cohort drop out every year, this should give an inflated impression of the raw total coming from the most recent cohort (2020) compared to earlier cohorts though. Comparing raw totals across years is not straightforward, because sample size varies each year (and we sampled fewer people in 2020 than earlier years as discussed here and here and although we think we can estimate our sampling rate for engaged EAs quite well, we're less certain about the true size of the more diffuse less engaged EA population (see here))- so the totals for ~2017 at the time were likely relatively higher.
We actually just performed the same analyses as we did last year, so any references to significance are after applying the Bonferroni adjustment. We just decided to show the confidence intervals rather than just the binary significant/not significant markers this year, but of course different people have different views about which is better.
Could I ask to clarify which question you are looking at? I assume maybe the importance for retention question? There we observe that non-white respondents are more likely to select EAGx specifically (about twiceas large a percentage of non-white respondents selected EAGx) and indeed I expect this is driven by geography for the reasons you say. There is no significant difference for EAG, though a slightly higher percentage of non-white respondents selected that as well.)
To answer your question about the analyses: the chi-square tests just look at whether there are differences in the proportions of white/non-white, male/non-male respondents selecting different categories don't attempt to control for other characteristics. So you should just read these as identifying differences between these groups, rather than as necessarily showing that these differences are explained by the groupings themselves. Note that just knowing the proportions, even if they're not causal may still be action-relevant, i.e. we might want to know what programs are actually helping a larger number of non-white EAs, even if this is ultimately explained by some third factor). In contrast, in the models at the end looking at predictors of NPS and change in level of interest in EA we do try to control for different influences simultaneously.
Interestingly, EAG attendees don't seem straightforwardly newer to EA than EAS respondents. I would agree that it's likely explained by things like age/student status and more generally which groups are more likely to be interested in this kind of event.
Your comment makes it sound like you think there's some mystery to resolve here or that the composition of people who engaged with WANBAM conflicts with the EA Survey data. But it's hard for me to see how that would be the case. Is there any reason to think that the composition of people who choose to interact with WANBAM (or who get mentored by WANBAM) would be representative of the broader EA community as a whole?
WANBAM is prominently marketed as "a global network of women, trans people of any gender, and non-binary people" and explicitly has DEI as part of its mission. It seems like we would strongly expect the composition of people who choose to engage with WANBAM to be more "diverse" than the community as a whole. I don't think we should be surprised that the composition of people who interact with WANBAM differs from the composition of the wider community as a whole any more than we should be surprised that a 'LessWrongers in EA' group, or some such, differed from the composition of the broader community. Maybe an even closer analogy would be whether we should be surprised that a Diversity and Inclusion focused meetup at EAG has a more diverse set of attendees than the broader EA Global audience.
Also, it seems a little odd to even ask, but does WANBAM take any efforts to try to reach a more diverse audience in terms of race/ethnicity or geography or ensure that the people you mentor are a diverse group? If so, then it also seems clear that we'd expect WANBAM to have higher numbers from the groups you are deliberately trying to reach more of.
It's possible I'm missing something, but given all this, I don't see why we'd expect the people who WANBAM elect to mentor to be representative of the wider EA community (indeed, WANBAM explicitly focuses only on a minority of the EA community), so I don't see these results as having too much relevance to estimating the composition of the community as a whole.
Regarding external communications for the EA Survey. The EA Survey is promoted by a bunch of different outlets, including people just sharing it with their friends, and it goes without saying we don't directly control all of these messages. Still, the EA Survey itself isn't presented with any engagement requirement and the major 'promoters' make an effort to make clear that we encourage anyone with any level of involvement or affiliation with EA to take the survey. Here's a representative example from the EA Newsletter, which has been the major referrer in recent years:
If you think of yourself, however loosely, as an “effective altruist,” please consider taking the survey — even if you’re very new to EA! Every response helps us get a clearer picture.
Another thing we can do is compare the composition of people who took the EA Survey from the different referrers. It would be surprising if the referrers to the EA Survey, with their different messages, all happened to employ external communications that artificially reduce the apparent ethnic diversity of the EA community. In fact, all the figures for % not-only-white across the referrers are much lower than the 40% figure for WANBAM mentees, ranging between 17-28% (roughly in line with the sample as a whole). There was one exception, which was an email sent to local group organizers, which was 36% not-only-white. That outlier is not surprising to me since, as we observed in the EA Groups Survey, group organizers are much less white than the community as a whole (47% white). This makes sense, simply because there are a lot of groups run in majority non-white countries, meaning there are a lot of non-white organizers from these groups, even though the global community (heavily dominated by majority white countries) is majority white.
Thanks for your question. The 2020 posts have already started coming out. You can see the first two here and here. And you can also find them on our website here. All the posts will also be under the Effective Altruism Survey tag.
Of course, much of this growth in the number of highly engaged EAs is likely due to EAs becoming more engaged, rather than there becoming more EAs. As it happens, EAS2020 had more 4's but fewer 5's, which I think can plausibly be explained by the general reduction in rate of people sampled, mentioned above, but a number of 1-3s moving into the 4 category and fewer 4s moving into the 5 category (which is more stringent, e.g. EA org employee, group leader etc.).
[Epistemic status: just looked into this briefly out of curiosity, not an official EAS analysis]
When I looked at this briefly in a generalized mixed model, I didn't find a significant interaction effect for gender * engagement * the specific factor people were evaluating (e.g. EAG or group etc.) which comports with your observation that there doesn't seem to be anything particularly interesting going on in the contrast between male and non-male interaction with low/high engagement. (In contrast, there were significant fixed effects for the interaction between engagement and the specific factor and gender and the specific factor.) Looking at the specific 'getting involved factors' in the interaction effect, it was clear where the only one where there was much of a hint of any interaction with gender * engagement was personal contact, which was "borderline significant" (though I am loathe to read much into that).
Probably the simplest way to illustrate the specific thing you mentioned is with the following two plots: looking at both male and non-male respondents, we can see that highly engaged respondents are more likely to select EAG than less engaged respondents, but the pattern is similar for both male and non-male respondents.
I have it on neutral and this was still virtually hidden from me: the same with the new How People Get Involved post, which was so far down the front page that I couldn't even fit it on a maximally zoomed out screenshot. I believe this is being looked into.
Thanks Barry. I agree it's interesting to make the comparison!
Do you know if these numbers (for attendees) differ from the numbers for applicants? I don't know if any of these events were selective to any degree (as EA Global is), but if so, I'd expect the figures for applicants to be a closer match to those of the EA Survey (and the community as a whole), even if you weren't explicitly filtering with promoting diversity in mind. I suppose there could also be other causes of this, in addition to self-selection, such as efforts you might have made to reach a diverse audience when promoting the events?
One thing I noted is that the attendees for these events appear to be even younger and more student-heavy than the EA Survey sample (of course, at least one of the events seems to have been specifically for students). This might explain the differences between your figures and those of the EA Survey. In EA Survey data, based on a quick look, student respondents appear to be slightly more female and slighly less white than non-students.