## Posts

Book review: Architects of Intelligence by Martin Ford (2018) 2020-08-11T17:24:16.833Z · score: 11 (7 votes)
ofer's Shortform 2020-02-19T06:53:16.647Z · score: 3 (1 votes)

Comment by ofer on Avoiding Munich's Mistakes: Advice for CEA and Local Groups · 2020-10-15T16:33:18.024Z · score: 52 (21 votes) · EA · GW

Thank you for writing this important post Larks!

I would add that the harm from cancel culture's chilling effect may be a lot more severe than what people tend to imagine. The chilling effect does not only prevent people from writing things that would actually get them "canceled". Rather, it can prevent people from writing things that they merely have a non-neglectable credence (e.g. 0.1%) of getting them canceled (at some point in the future); which is probably a much larger and more important set of things/ideas that we silently lose.

Comment by ofer on ofer's Shortform · 2020-10-04T16:21:26.566Z · score: 5 (4 votes) · EA · GW

[Certificates of Impact]

To implement certificates of impact we need to decide how we want projects to be evaluated. The following is a consideration that seems to me potentially important (and I haven't seen it mentioned yet):

If the evaluation of a project ignores substantial downside risks that the project once had but no longer has (because fortunately things turned out well), the certificate market might incentivize people to carry out risky net-negative projects: If things turn out great, the project's certificates will be worth a lot; and thus when the project is just started the future value of its certificates is large in expectation. (Impact certificates can never have a negative market price, even if the project's impact turns out to be horrible).

Comment by ofer on Are social media algorithms an existential risk? · 2020-09-16T12:55:01.333Z · score: 2 (2 votes) · EA · GW

Perhaps not one that "threatens the premature extinction of Earth-originating intelligent life" (Bostrom, 2012)

I just want to flag that the full sentence from that paper is: "An existential risk is one that threatens the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development (Bostrom 2002)."

Comment by ofer on Are social media algorithms an existential risk? · 2020-09-16T12:39:24.914Z · score: 2 (2 votes) · EA · GW

From an AI safety perspective, the algorithms that create the feeds that social media users see do have some properties that make them potentially more concerning than most AI applications:

1. The top capabilities are likely to be concentrated rather than distributed. For example, very few actors in the near future are likely to invest resources in such algorithms in a similar scale to Facebook.
2. The feed-creation-solution (or policy, in reinforcement learning terminology) being searched for has a very rich real-world action space (e.g. showing some post X to some user Y, where Y is any person from a set of 3 billion FB users).
3. The social media company is incentivized to find a policy that maximizes users' time-spent over a long time horizon (rather than using a very small discount factor).
4. Early failures/deception-attempts may be very hard to detect, especially if the social media company itself is not on the lookout for such failures.

These properties seem to make it less likely that relevant people would see sufficiently alarming small-scale failures before the point where some AI systems pose existential risks.

Comment by ofer on Challenges in evaluating forecaster performance · 2020-09-12T14:19:28.881Z · score: 1 (1 votes) · EA · GW

This makes Alice a better forecaster

As long as we keep asking Alice and Bob questions via the same platform, and their incentives don't change, I agree. But if we now need to decide whether to hire Alice and/or Bob to do some forecasting for us, comparing their average daily Brier score is problematic. If Bob just wasn't motivated enough to update his forecast every day like Alice did, his lack of motivation can be fixed by paying him.

Comment by ofer on Challenges in evaluating forecaster performance · 2020-09-12T14:13:14.132Z · score: 1 (1 votes) · EA · GW

Thanks for the explanation!

I don't think this formal argument conflicts with the claim that we should expect the forecasting frequency to affect the average daily Brier score. In the example that Flodorner gave where the forecast is essentially resolved before the official resolution date, Alice will have perfect daily Brier scores: , for any , while in those days Bob will have imperfect Brier scores: .

Comment by ofer on Challenges in evaluating forecaster performance · 2020-09-12T05:29:44.309Z · score: 2 (2 votes) · EA · GW

The long-term solution here is to allow forecasters to predict functions rather than just static values. This solves problems of things like people needing to update for time left.

Do these functions map events to conditional probabilities? (I.e. mapping an event to the probability of something conditioned on that event happening)? How will this look like for the example of forecasting an election result?

In terms of the specific example though, I think if a significant new poll comes out and Alice updates and Bob doesn't, Alice is a better forecaster and deserves more reward than Bob.

Suppose Alice encountered the important poll result because she was looking for it (as part of her effort to come up with a new forecast). At the end of the day what we really care about is how much weight we should place on any given forecast made by Alice/Bob. We don't directly care about the average daily Brier score (which may be affected by the forecasting frequency). [EDIT: this isn't true if the forecasting platform and the forecasters' incentives are the same when we evaluate the forecasters and when we ask the questions we care about.]

Comment by ofer on Challenges in evaluating forecaster performance · 2020-09-12T05:26:12.215Z · score: 1 (1 votes) · EA · GW

I didn't follow that last sentence.

Notice that in the limit it's obvious we should expect the forecasting frequency to affect the average daily Brier score: Suppose Alice makes a new forecast every day while Bob only makes a single forecast (which is equivalent to him making an initial forecast and then blindly making the same forecast every day until the question closes).

Comment by ofer on Challenges in evaluating forecaster performance · 2020-09-11T19:43:16.444Z · score: 1 (1 votes) · EA · GW

After thinking for a few more minutes, it seems that forecasting more often but at random moments shouldn't impact the expected Brier score.

In my toy example (where the forecasting moments are predetermined), Alice's Brier score for day X will be based on a"fresh" prediction made on that day (perhaps influenced by a new surprising poll result), while Bob's Brier score for that day may be based on a prediction he made 3 weeks earlier (not taking into account the new poll result). So we should expect that the average daily Brier score will be affected by the forecasting frequency (even if the forecasting moments are uniformly sampled).

In this toy example the best solution seems to be using the average Brier score over the set of days in which both Alice and Bob made a forecast. If in practice this tends to leave us with too few data points, a more sophisticated solution is called for. (Maybe partitioning days into bins and sampling a random forecast from each bin? [EDIT: this mechanism can be gamed.])

Comment by ofer on Challenges in evaluating forecaster performance · 2020-09-11T14:41:08.669Z · score: 4 (3 votes) · EA · GW

The rewarding-more-active-forecasters problem seems severe and I'm surprised it's not getting more attention. If Alice and Bob both forecast the result of an election, but Alice updates her forecast every day (based on the latest polls) while Bob only updates his forecast every month, it doesn't make sense to compare their average daily Brier score.

Comment by ofer on Some thoughts on the EA Munich // Robin Hanson incident · 2020-08-29T12:41:25.902Z · score: 19 (7 votes) · EA · GW

You'd expect having a wider range of speakers to increase intellectual diversity — but only as long as hosting Speaker A doesn't lead Speakers B and C to avoid talking to you

As an aside, if hosting Speaker A is a substantial personal risk to the people who need to decide whether to host Speaker A, I expect the decision process to be biased against hosting Speaker A (relative to an ideal EA-aligned decision process).

Comment by ofer on "Good judgement" and its components · 2020-08-20T14:03:26.552Z · score: 1 (1 votes) · EA · GW

Thank you for the thoughtful comment!

As an aside, when I wrote "we usually need to have a good understanding ..." I was thinking about explicit heuristics. Trying to understand the implications of our implicit heuristics (which may be hard to influence) seems somewhat less promising. Some of our implicit heuristics may be evolved mechanisms (including game-theoretical mechanisms) that are very useful for us today, even if we don't have the capacity to understand why.

Comment by ofer on "Good judgement" and its components · 2020-08-20T03:18:17.732Z · score: 7 (4 votes) · EA · GW

Thanks for writing this!

Heuristics are rules of thumb that you apply to decisions. They are usually held implicitly rather than in a fully explicit form. They make statements about what properties of decisions are good, without trying to provide a full causal model for why that type of decision is good.

I think we usually need to have a good understanding of why a certain heuristic is good and what are the implications of following it (maybe you agree with this; it wasn't clear to me from the post). The world is messy and complex. We don't get to see the counterfactual world where we didn't follow the heuristic at a particular time, and the impact of following the heuristic may be dominated by flow-through effects.

Comment by ofer on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher · 2020-08-08T12:09:58.323Z · score: 1 (1 votes) · EA · GW

Beyond the instrumental convergence thesis, though, I do think that some bits of the classic arguments are awkward to fit onto concrete and plausible ML-based development scenarios: for example, the focus on recursive self-improvement, and the use of thought experiments in which natural language commands, when interpretted literally and single-mindedly, lead to unforeseen bad behaviors. I think that Reframing Superintelligence does a good job of pointing out some of the tensions between classic ways of thinking and talking about AI risk and current/plausible ML engineering practices.

Would you say that the treacherous turn argument can also be mapped over to contemporary ML methods (similarly to the instrumental convergence thesis) due to it being a fairly abstract principle?

Also, why is "recursive self-improvement" awkward to fit onto concrete and plausible ML-based development scenarios? (If we ignore the incorrect usage of the word "recursive" here; the concept should have been called "iterative self-improvement"). Consider the work that has been done on neural architecture search via reinforcement learning (this 2016 paper on that topic currently has 1,775 citations on Google Scholars, including 560 citations from 2020). It doesn't seem extremely unlikely that such a technique will be used, at some point in the future, in some iterative self-improvement setup, in a way that may cause an existential catastrophe.

Regarding the example with the agent that creates the feed of each FB user:

the system wouldn't, in any meaningful sense, have long-run objectives (due to the shortness of sessions).

I agree that the specified time horizon (and discount factor) is important, and that a shorter time horizon seems safer. But note that FB is incentivized to specify a long time horizon. For example, suppose the feed-creation-agent shows a user a horrible post by some troll, which causes the user to spend many hours in a heated back-and-forth with said troll. Consequently, the user decides FB sucks and ends up getting off FB for many months. If the specified time horizon is sufficiently short (or the discount factor is sufficiently small), then from the perspective of the training process the agent did well when it showed the user that post, and the agent's policy network will be updated in a way that makes such decisions more likely. FB doesn't want that. FB's actual discount factor for users' engagement time may be very close to 1 (i.e. a user spending an hour on FB today is not 100x more valuable to FB than the user spending an hour on FB next month). This situation is not unique to FB. Many companies that use RL agents that act in the real world have long-term preferences with respect to how their RL agents act.

It also probably wouldn't have the ability or inclination to manipulate the external world in the pursuit of complex schemes.

Regarding the "inclination" part: Manipulating the "external world" (what other environment does the feed-creation-agent model?) in the pursuit of certain complex schemes is very useful for maximizing the user engagement metric (that by assumption corresponds to the specified reward function). Also, I don't see how the "wouldn't have the ability" part is justified in the limit as the amount of training compute (and architecture size) and data grows to infinity.

Figuring out how to manipulate the external world in precise ways would require a huge amount of very weird exploration, deep in a section of the space of possible policies where most of the policies are terrible at maximizing reward

We expect the training process to update the policy network in a way that makes the agent more intelligent (i.e. better at modeling the world and causal chains therein, better at planning, etc.), because that is useful for maximizing the sum of discounted rewards. So I don't understand how your above argument works, unless you're arguing that there's some upper bound on the level of intelligence that we can expect deep RL algorithms to yield, and that upper bound is below the minimum level for an agent to pose existential risk due to instrumental convergence.

in the unlikely event that the necessary exploration happened, and the policy started moving in this direction, I think it would be conspicuous before the newsfeed selection algorithm does something like kill everyone to prevent ongoing FB sessions from ending

We should expect a sufficiently intelligent agent [EDIT: that acts in the real world] to refrain from behaving in a way that is both unacceptable and conspicuous, as long as we can turn it off (that's the treacherous turn argument). The question is whether the agent will do something sufficiently alarming and conspicuous before the point where it is intelligent enough to realize it should not cause alarm. I don't think we can be very confident either way.

Comment by ofer on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher · 2020-08-06T15:22:01.776Z · score: 3 (2 votes) · EA · GW

Hi Ben,

You suggested in the podcast that it's not clear how to map some of the classic arguments—and especially their manifestation in thought experiments like the paper clip maximizer—to contemporary machine learning methods. I'd like to push back on that view.

Deep reinforcement learning is a popular contemporary ML approach for training agents that act in simulated and real-world environments. In deep RL, an agent is trained to maximize its reward (more precisely, the sum of discounted rewards over time steps), which perfectly fits the "agent" abstraction that is used throughout the book Superintelligence. I don't see how classic arguments about the behavior of utility maximizing agents fail to apply to deep RL agents. Suppose we replace every occurrence of the word "agent" in the classic arguments with "deep RL agent"; are the modified arguments false? Here's the result of doing just that for the instrumental convergence thesis (the original version is from Superintelligence, p. 109):

Several instrumental values can be identiﬁed which are convergent in the sense that their attainment would increase the chances of the deep RL agent’s goal being realized for a wide range of ﬁnal goals and a wide range of situations, implying that these instrumental values are likely to be pursued by a broad spectrum of situated intelligent deep RL agents.

For the sake of concreteness, consider the algorithm that Facebook uses to create the feed that each user sees (which is an example that Stuart Russell has used). Perhaps there's very little public information about that algorithm, but it's reasonable to guess they're using some deep RL algorithm and a reward function that roughly corresponds to user engagement. Conditioned on that, do you agree that in the limit (i.e. when using whatever algorithm and architecture they're currently using, at a sufficiently large scale), the arguments about instrumental convergence seem to apply?

Regarding the treacherous turn problem, you said:

[...] if you do imagine things would be gradual, then it seems like before you encounter any attempts at deception that have globally catastrophic or existential significance, you probably should expect to see some amount of either failed attempts at deception or attempts at deception that exist, but they’re not totally, totally catastrophic. You should probably see some systems doing this thing of hiding the fact that they have different goals. And notice this before you’re at the point where things are just so, so competent that they’re able to, say, destroy the world or something like that.

Suppose Facebook's scaled-up-algorithm-for-feed-creation would behave deceptively in some way. Suppose it uses some unacceptable technique to increase user engagement (e.g. making users depressed), but it refrains from doing so whenever there's a risk that Facebook engineers would notice. How confident should we be that Facebook engineers would notice the deceptive behavior (i.e. the avoidance of unacceptable behavior in situations where the unacceptable behavior might be noticed)?

Comment by ofer on Is there anything like "green bonds" for x-risk mitigation? · 2020-07-01T16:58:42.983Z · score: 2 (2 votes) · EA · GW

That's a very interesting idea. Some relevant bits from the linked paper:

Only a few national governments could borrow on reasonable terms at this very long maturity. This observation argues for the creation of credible international institutions to underwrite the issuance of this type of debt. Let us call the prospective institutions the ’World Climate Bank’ or WCB. How could this bank be governed, and how could it maintain its solvency, solidity and credibility for the required period?

In order to pay the interest on the bonds, the WCB would have to command regular revenues. Two possibilities come to mind. The first is that the WCB would receive the proceeds of a global carbon tax directly, or have a first claim on them. [...]

The second is that the WCB could claim a share of national government revenues up to some limit that would allow it to pay interest on the appropriate quantity of debt, even as the revenues from the carbon tax or royalties decline with the declining use of fossil fuel. By this means, the WCB’s source of revenue would be spread across many national governments, thereby increasing the credibility of the interest guarantees in the bonds.

[...]

One risk the holders of WCB bonds would take is that individual nations might withdraw from the bank for some reason, leaving it insufficiently funded to meet its commitments. This consideration suggests that membership in the WCB should be a precondition for membership in other international organizations such as the World Trade Organization, the IMF and the World Bank. There would then be strong incentives for individual nations not to withdraw from the WCB.

[...]

If the WCB indexed long maturity bonds were widely held as international reserves, they would likely become a vehicle for private reserves seeking very low-risk havens, which would contribute to their marketability.

Comment by ofer on How can I apply person-affecting views to Effective Altruism? · 2020-04-29T21:09:47.973Z · score: 1 (1 votes) · EA · GW

I think you might not have clocked the OP's comment that the morally relevant being as just those that exist whatever we do, which would presumably rule out concerns for lives in the far future.*

What I tried to say is that the spacetime of the universe(s) may contain a vast number of sentient beings regardless of what we do. Therefore, achieving existential security and having something like a Long Reflection may allow us to help a vast number of sentient beings (including ones outside our future light cone).

**Further pedantry: if our actions changed their children, which they presumably would, it would just be the first generation of extraterrestrial visitors who mattered morally on this view.

I think we're not interpreting the person-affecting view described in the OP in the same way. The way I understand the view (and the OP is welcome to correct me if I'm wrong) it entails we ought to improve the well-being of the extraterrestrial visitors' children (regardless of whether our actions changed them / caused their existence).

Comment by ofer on How can I apply person-affecting views to Effective Altruism? · 2020-04-29T08:53:16.042Z · score: -1 (6 votes) · EA · GW

Hey there!

The universe/multiverse may be very large and (in the fullness of time) may contain a vast number of beings that we should care about and that we (and other civilizations similar to us) may be able to help in some way by using our cosmic endowment wisely. So person-affecting views seem to prescribe the standard maxipok strategy (see also The Precipice by Toby Ord).

[EDIT: by "we should care" I mean something like "we would care if we knew all the facts and had a lot of time to reflect".]

Comment by ofer on Making Impact Purchases Viable · 2020-04-18T10:02:03.547Z · score: 3 (2 votes) · EA · GW

Interesting points!

One possible response is that even if these projects aren't counterfactual, people who create impact deserve funding as a matter of fairness. This seems like a reasonable position, but notice that it is in tension with utilitarianism.

A few quick thoughts:

1. Being the sort of agent/civilization that tends to act in a way that is fair and just can be very beneficial (both to us and to anyone else similar to us).

(Consider that human evolution optimizes for inclusive fitness, and yet it spit out creatures that are sometimes willing to take a hit in their expected inclusive fitness for the purpose of enforcing fairness/justice.)

2. The line of reasoning in the quote seems like a special case of donors failing to coordinate (where the person carrying out the project at their own expense is in the role of one of the donors). Compare: "I would donate to get this project carried out if I had to, but since I know you'll donate 100% of the required amount if I won't donate - I'll let you do that."

3. That line of reasoning also suggests that EA orgs should ask each of their employees whether they will work for free if they don't get a salary; and refuse to pay a salary to employees who answer "yes".

Comment by ofer on Database of existential risk estimates · 2020-04-18T05:30:47.841Z · score: 1 (1 votes) · EA · GW

In your first scenario, if Alice sees Bob seems to be saying "Oh, well then I've just got to accept that there's a 5% chance"

Maybe a crux here is what fraction of people in the role of Bob would instead convince themselves that the unconditional estimate is nonsense (due to motivated reasoning).

Comment by ofer on Database of existential risk estimates · 2020-04-17T14:23:06.877Z · score: 1 (1 votes) · EA · GW

But it still seems like I'd want to make different decisions if I had reason to believe the risks from AI are 100 times larger than those from engineered pandemics, compared to if I believed the inverse.

Agreed, but I would argue that in this example, acquiring that belief is consequential because it makes you update towards the estimate: "conditioned on us not making more efforts to mitigate existential risks from AI, the probability of an AI related existential catastrophe is ...".

I might have been nitpicking here, but just to give a sense of why I think this issue might be relevant:

Suppose Alice, who is involved in EA, tells a random person Bob: "there's a 5% chance we'll all die due to X". What is Bob more likely to do next: (1) convince himself that Alice is a crackpot and that her estimate is nonsense; or (2) accept Alice's estimate, i.e. accept that there's a 5% chance that he and everyone he cares about will die due to X, no matter what he and the rest of humanity will do. (Because the 5% estimate already took into account everything that Bob and the rest of humanity will do about X.)

Now suppose instead that Alice tells Bob "If humanity won't take X seriously, there's a 10% chance we'll all die due to X." I suspect that in this scenario Bob is more likely to seriously think about X.

Comment by ofer on Database of existential risk estimates · 2020-04-17T08:44:37.195Z · score: 3 (3 votes) · EA · GW

Interesting project! I especially like that there's a sheet for conditional estimates (one can argue that the non-conditional estimates are not directly decision-relevant).

Comment by ofer on (How) Could an AI become an independent economic agent? · 2020-04-04T19:33:49.822Z · score: 4 (4 votes) · EA · GW

An AI system can theoretically "become an independent economic agent" in a practical sense without legally owning money. For example, suppose it has access to a lot of resources owned by some company, and nobody can understand its logic or its decisions; and blindly letting it handle those resources is the only way for the company to stay competitive.

Comment by ofer on Why not give 90%? · 2020-03-25T12:17:13.136Z · score: 10 (8 votes) · EA · GW

Thanks for writing this!

I worry that people who are new to EA might read this post and get the impression that there's an expectation from people in EA to have some form of utilitarianism as their only intrinsic goal. So I'd like to flag that EA is a community of humans :). Humans are the result of human evolution—a messy process that roughly optimizes for inclusive fitness. It's unlikely that any human can be perfectly modeled as a utilitarian (with limited will power etcetera, but without any intrinsic goal that is selfish).

Of course, this does not imply we shouldn't have important discussions about burnout in EA. (In the case of the OP I would just pose the question a bit differently, maybe: "Should a utilitarian give 90%?").

Comment by ofer on How can EA local groups reduce likelihood of our members getting COVID-19 or other infectious diseases? · 2020-02-27T05:19:23.742Z · score: 5 (4 votes) · EA · GW

Strongly discourage handshakes. Encourage the elbow bump or bows instead.

Is the elbow bump recommended even if people are sneezing/coughing into their elbows?

[EDIT: maybe people should only cough into their left elbow?]

Comment by ofer on ofer's Shortform · 2020-02-19T06:53:18.423Z · score: 4 (3 votes) · EA · GW

The 2020 annual letter of Bill and Melinda Gates is titled "Why we swing for the fences" and it seems to spotlight an approach that resembles OpenPhil's hits-based giving approach.

From the 2020 annual letter:

At its best, philanthropy takes risks that governments can’t and corporations won’t. Governments need to focus most of their resources on scaling proven solutions.

[...]

As always, Warren Buffett—a dear friend and longtime source of great advice—put it a little more colorfully. When he donated the bulk of his fortune to our foundation and joined us as a partner in its work, he urged us to “swing for the fences.”

That’s a phrase many Americans will recognize from baseball. When you swing for the fences, you’re putting every ounce of strength into hitting the ball as far as possible. You know that your bat might miss the ball entirely—but that if you succeed in making contact, the rewards can be huge.

That’s how we think about our philanthropy, too. The goal isn’t just incremental progress. It’s to put the full force of our efforts and resources behind the big bets that, if successful, will save and improve lives.

[...]

When Warren urged Melinda and me to swing for the fences all those years ago, he was talking about the areas our foundation worked on at the time, not climate change. But his advice applies here, too. The world can’t solve a problem like climate change without making big bets.

Comment by ofer on Conversation on AI risk with Adam Gleave · 2019-12-27T23:49:13.230Z · score: 11 (4 votes) · EA · GW
Gleave thinks discontinuous progress in AI is extremely unlikely:

I'm confused about this point. Did Adam Gleave explicitly say that he thinks discontinuous progress is "extremely unlikely" (or something to this effect)?

From the transcript I get a sense of a less confident estimate being made:

Adam Gleave: [...] I don’t see much reason for AI progress to be discontinuous in particular.

Adam Gleave: [...] I don’t expect there to be a discontinuity, in the sense of, we just see this sudden jump.
Comment by ofer on 2019 AI Alignment Literature Review and Charity Comparison · 2019-12-21T09:01:24.118Z · score: 25 (11 votes) · EA · GW
Financial Reserves

You listed important considerations; here are some additional points to consider:

1. As suggested in SethBaum's comment, a short runway may deter people from joining the org (especially people with larger personal financial responsibilities and opportunity cost).

2. It seems likely that—all other things being equal—orgs with a longer runway are "less vulnerable to Goodhart's law" and generally less prone to optimize for short-term impressiveness in costly ways. Selection effects alone seem sufficient to justify this belief: Orgs with a short runway that don't optimize for short-term impressiveness seem less likely to keep on existing.

Comment by ofer on But exactly how complex and fragile? · 2019-12-13T13:31:18.200Z · score: 1 (1 votes) · EA · GW
The traditional argument for AI alignment being hard is that human value is ‘complex’ and ‘fragile’.

Presumably, many actors will be investing a lot of resources into building the most capable and competitive ML models in many domains (e.g. models for predicting stock prices). It seems to me that the purpose of the field of AI alignment is to make it easier for actors to build such models in a way that is both safe and competitive. AI alignment seems hard to me because using arbitrarily-scaled-up versions of contemporary ML methods—in a safe and competitive way—seems hard.

Comment by ofer on What metrics may be useful to measure the health of the EA community? · 2019-11-14T16:51:48.216Z · score: 1 (1 votes) · EA · GW

Some more ideas for metrics that might be useful for tracking 'the health of the EA community' (not sure whether they fit in the first category):

How much runway do EA orgs have?

How diverse is the 'EA funding portfolio'? [EDIT: I'm referring here to the diversity of donors rather than the diversity of funding recipients.]

Comment by ofer on Summary of Core Feedback Collected by CEA in Spring/Summer 2019 · 2019-11-08T11:24:10.889Z · score: 4 (3 votes) · EA · GW

To clarify my view, I do think there is a large variance in risk among 'long-term future interventions' (such as donating to FHI, or donating to fund an independent researcher with a short track record).

Comment by ofer on Summary of Core Feedback Collected by CEA in Spring/Summer 2019 · 2019-11-07T22:41:28.918Z · score: 2 (2 votes) · EA · GW

Thanks for publishing this!

...

1. Funds was targeted to meet the needs of a small set of donors, but was advertised to the entire EA community.

.

Many donors may not want their donations going towards “unusual, risky, or time-sensitive projects”, and respondents were concerned that the Funds were advertised to too broad a set of donors, including those for whom the Funds may not have been a good fit.

.

we do not currently proactively advertise EA Funds.

I'd be happy to learn more about these considerations/concerns. It seems to me that many of the interventions that are a good idea from a 'long-term future perspective' are unusual, risky, or time-sensitive. Is this an unusual view in the EA sphere?

Comment by ofer on Does 80,000 Hours focus too much on AI risk? · 2019-11-03T19:58:39.927Z · score: 4 (3 votes) · EA · GW

Is this the case in the AI safety community?

I have no idea to what extent the above factor is influential amongst the AI safety community (i.e. the set of all AI safety (aspiring) researchers?).

If the reasoning for their views isn't obviously bad, I would guess that it's "cool" to say unpopular or scary but not unacceptable things, because the rationality community has been built in part on this.

(As an aside, I'm not sure what's the definition/boundary of the "rationality community", but obviously not all AI safety researchers are part of it.)

Comment by ofer on [deleted post] 2019-11-03T19:56:45.943Z

.

Comment by ofer on Does 80,000 Hours focus too much on AI risk? · 2019-11-03T10:15:09.319Z · score: 30 (10 votes) · EA · GW

One factor that seems important is that even a small probability of "very short timelines and a sharp discontinuity" is probably a terrifying prospect for most people. Presumably, people tend to avoid saying terrifying things. Saying terrifying things can be costly, both socially and reputationally (and there's also the possible side effect of, well, making people terrified).

I hope to write a more thorough answer to this soon (I'll update this comment accordingly by 2019-11-20).

[EDIT (2019-11-18): adding the content below]

(I should note that I haven't yet discussed some of the following with anyone else. Also, so far I had very little one-on-one interaction with established AI safety researchers, so consider the following to be mere intuitions and wild speculations.)

Suppose that some AI safety researcher thinks that 'short timelines and a sharp discontinuity' is likely. Here are some potential reasons that might cause them to not discuss their estimate publicly:

1. Extending the point above ("people tend to avoid saying terrifying things"):

• Presumably, most people don't want to give a vibe of an extremist.
• People might be concerned that the most extreme/weird part of their estimate would end up getting quoted a lot in an adversarial manner, perhaps is a somewhat misleading way, for the purpose of dismissing their thoughts and making them look like a crackpot.
• Making someone update towards such an estimate might put them in a lot of stress which might have a negative impact on their productivity.
2. Voicing such estimates publicly might make the field of AI safety more fringe.

• When the topic of 'x-risks from AI' is presented to a random person, presenting a more severe account of the risks might make it more likely that the person would rationalize away the risks due to motivated reasoning.
• Being more optimistic probably correlates with others being more willing to collaborate with you. People are probably generally attracted to optimism, and working with someone who is more optimistic is probably a more attractive experience.
• Therefore, the potential implications of voicing such estimates publicly include:
• making talented people less likely to join the field of AI safety;
• making established AI researchers (and other key figures) more hesitant to be associated with the field; and
• making donors less likely to donate to this cause area.
3. Some researchers might be concerned that discussing such estimates publicly would make them appear as fear mongering crooks who are just trying to get funding or better job security.

• Generally, I suspect that most researchers that work on xrisk reduction would strongly avoid saying anything that could be pattern-matched to "I have this terrifying estimate about the prospect of the world getting destroyed soon in some weird way; and also, if you give me money I'll do some research that will make the catastrophe less likely to happen."
• Some supporting evidence that those who work on xrisk reduction indeed face the risk of appearing as fear mongering crooks:
• Oren Etzioni, a professor of computer science at the University of Washington and the CEO of the Allen Institute for Artificial Intelligence (not to be confused with the Alan Turing Institute) wrote an article for the MIT Technology Review in 2016 (which was summarized by an AI Impacts post on November 2019). In that article, which is titled "No, the Experts Don’t Think Superintelligent AI is a Threat to Humanity", Etzioni cited the following comment that is attributed to an anonymous AAAI Fellow:

Nick Bostrom is a professional scare monger. His Institute’s role is to find existential threats to humanity. He sees them everywhere. I am tempted to refer to him as the ‘Donald Trump’ of AI.

Note: at the end of that article there's an update from November 2016 that includes the following:

I’m delighted that Professors Dafoe & Russell, who responded to my article here, and I seem to be in agreement on three critical matters. One, we should refrain from ad hominem attacks. Here, I have to offer an apology: I should not have quoted the anonymous AAAI Fellow who likened Dr. Bostrom to Donald Trump. I didn’t mean to lend my voice to that comparison; I sincerely apologized to Bostrom for this misstep via e-mail, an apology that he graciously accepted. [...]

• See also this post by Jessica Taylor from July 2019, titled "The AI Timelines Scam" (a link post for it was posted on the EA Forum), which seems to argue for the (very reasonable) hypothesis that financial incentives have caused some people to voice short timelines estimates (it's unclear to me what fraction of that post is about AI safety orgs/people, as opposed to AI orgs/people in general).

4. Some researchers might be concerned that in order to explain why they have short timelines they would need to publicly point at some approaches that they think might lead to short timelines, which might draw more attention to those approaches which might cause shorter timelines in a net-negative manner.

5. If voicing such estimates would make some key people in industry/governments update towards shorter timelines, it might contribute to 'race dynamics'.

6. If a researcher with such an estimate does not see any of their peers publicly sharing such estimates, they might reason that sharing their estimate publicly is subject to the unilateralist’s curse. If the researcher has limited time or a limited network, they might opt to "play it safe", i.e. decide to not share their estimate publicly (instead of properly resolving the unilateralist’s curse by privately discussing the topic with others).

Comment by ofer on Does 80,000 Hours focus too much on AI risk? · 2019-11-03T05:46:10.280Z · score: 30 (9 votes) · EA · GW

There seems to be a large variance in researchers' estimates about timelines and takeoff-speed. Pointing to specific writeups that lean one way or another can't give much insight about the distribution of estimates. Also, I think that at least some researchers are less likely to discuss their estimates publicly if they're leaning towards shorter timelines and a discontinuous takeoff, which subjects the public discourse on the topic to a selection bias.

So I'm skeptical about the claim that "Most researchers seem to be moving away from a fast takeoff view of AI safety, and are now opting for a softer takeoff view".

Top AI safety researchers are now saying that they expect AI to be safe by default, without further intervention from EA. See here and here.

Again, there seems to be a large variance in researchers' views about this. Pointing to specific writeups can't give much insight about the distribution of views.

Comment by ofer on Reflections on EA Global London 2019 (Mrinank Sharma) · 2019-10-30T20:36:53.472Z · score: 1 (1 votes) · EA · GW
What’s Stopping Advanced Applications of AI?
In many cases, there are cultural issues (within an industry) about the application of algorithms to make crucial decisions. Whilst interpretability of systems would increase the buy in, there are also key issues with the quality of data, and the infrastructure to collect high quality data.
It is worth nothing that the barriers here seem to not be technical, so it is unclear how much of an impact technical research would have here.

Perhaps this model was proposed for certain domains? Maybe ones in which laws restrict applications, like driverless cars?

It doesn't seem to me plausible for all domains (for example, it doesn't seem to me plausible for language models and quantitative trading).

Comment by ofer on Only a few people decide about funding for community builders world-wide · 2019-10-25T17:05:57.965Z · score: 2 (2 votes) · EA · GW

Comment by ofer on Only a few people decide about funding for community builders world-wide · 2019-10-24T18:47:22.778Z · score: 1 (1 votes) · EA · GW

The latter (not MIRI in particular).

Comment by ofer on Only a few people decide about funding for community builders world-wide · 2019-10-24T10:34:06.915Z · score: 1 (3 votes) · EA · GW

(unrelated to the OP)

You might well think that eg MIRI's agenda should be more widely worked on, or that it would be better if MIRI had more sources of funding. But it doesn't seem worrying that that isn't case.

This consideration seems important and I couldn't understand it (I'm talking about the general consideration, not its specific application to MIRI's agenda). I'd be happy to read more about it.

Comment by ofer on Conditional interests, asymmetries and EA priorities · 2019-10-22T06:41:39.244Z · score: 1 (3 votes) · EA · GW

My very tentative view is that we're sufficiently clueless about the probability distribution of possible outcomes from "Risks posed by artificial intelligence" and other x-risks, that the ratio between [the value one places on creating a happy person] and [the value one places on helping a person who is created without intervention] should have little influence on the prioritization of avoiding existential catastrophes.

Comment by ofer on The Future of Earning to Give · 2019-10-14T05:59:00.663Z · score: 10 (6 votes) · EA · GW

Interesting post!

Today, there's almost enough money going into far future causes, so that vetting and talent constraints have become at least as important as funding.

This seems to rely on the assumption that existing prestigious orgs are asking for all the funding they can effectively use. My best guess is that these orgs tend to not ask for a lot more funding than what they predict they can get. One potential reason for this is that orgs/grant-seekers regard such requests as a reputational risk.

Here's some supporting evidence for this, from this Open Phil blog post by Michael Levine (August 2019):

After conversations with many funders and many nonprofits, some of whom are our grantees and some of whom are not, our best model is that many grantees are constantly trying to guess what they can get funded, won’t ask for as much money as they should ask for, and, in some cases, will not even consider what they would do with some large amount because they haven’t seriously considered the possibility that they might be able to raise it.
Comment by ofer on Long-Term Future Fund: August 2019 grant recommendations · 2019-10-10T08:33:33.351Z · score: 16 (5 votes) · EA · GW

Thank you!

This suggests that at an additional counterfactually valid donation of $10,000 to the fund, donated prior to this grant round, would have had (if not saved for future rounds) about 60% of the cost-effectiveness of the$439,197 that was distributed.

It might be useful to understand how much more money the fund could have distributed before reaching a very low marginal cost-effectiveness. For example, if the fund had to distribute in this grant round a counterfactually valid donation of $5MM, how would the cost-effectiveness of that donation compare to that of the$439,197 that was distributed?

Comment by ofer on Long-Term Future Fund: August 2019 grant recommendations · 2019-10-09T13:45:05.668Z · score: 25 (9 votes) · EA · GW

It might be useful to get some opinions/intuitions from fund managers on the following question:

How promising is the most promising application that you ended up not recommending a grant for? How would a counterfactually valid grant for that application compare to the \$439,197 that was distributed in this round, in terms of EV per dollar?

Comment by ofer on Are we living at the most influential time in history? · 2019-09-19T15:33:33.055Z · score: 2 (2 votes) · EA · GW
So your argument doesn't seems to save existential risk work. The only way to get a non-trivial P(high influence | long future) with your prior seems to be by conditioning on an additional observation "we're extremely early". As I argued here, that's somewhat sketchy to do.

As you wrote, the future being short "doesn’t necessarily imply that xrisk work doesn’t have much impact because the future might just be short in terms of people in our anthropic reference class".

Another thought that comes to mind is that there may exist many evolved civilizations that their behavior is correlated with our behavior. If so, us deciding to work hard on reducing x-risks means it's more likely that those other civilizations would also decide—during early centuries—to work hard on reducing x-risks.

Comment by ofer on Ask Me Anything! · 2019-09-18T16:02:19.665Z · score: 2 (2 votes) · EA · GW
(ii) trying to map the Yudkowsky/Bostrom arguments, which were made before the deep learning paradigm, onto actual progress in machine learning, and finding them hard to fit well. Going into this properly would require a lot more discussion though!)

If we end up with powerful deep learning models that optimize a given objective extremely well, the main arguments in Superintelligence seem to go through.

(If we end up with powerful deep learning models that do NOT optimize a given objective, it seems to me plausible that x-risks from AI are more severe, rather than less.)

[EDIT: replaced "a specified objective function" with "a given objective"]

Comment by ofer on Are we living at the most influential time in history? · 2019-09-04T15:36:19.496Z · score: 10 (8 votes) · EA · GW

Interesting post!

But even if we restricted ourselves to a uniform prior over the first 10% of civilisation’s history, the prior would still be as low as 1 in 100,000.

Why should we use a uniform distribution as a prior? If I had to bet on which century would be the most influential for a random alien civilization, my prior distribution for "most influential century" would be a monotonically decreasing function.

Comment by ofer on The Case for the EA Hotel · 2019-04-10T10:31:57.799Z · score: 1 (1 votes) · EA · GW

Yes, thanks.

Comment by ofer on The Case for the EA Hotel · 2019-04-01T05:09:56.761Z · score: 14 (12 votes) · EA · GW

There's an additional argument in favor of the EA Hotel idea which I find very compelling (I've read it on this forum in a comment that I can't find; EDIT: it was this comment by the user Agrippa - the following is not at all a precise description of the original comment and contains extra things that Agrippa might not agree with):

A lot of people are optimizing to get money as an instrumental goal and funders don't always have a great way to evaluate how much a person that is asking for money is "EA-aligned" (for any reasonable definition of that term).

The willingness to travel and live for a while in a building with people that are excited about EA probably correlates with "being EA-aligned".

So supporting people via funding their residency in a place like the EA Hotel seems to allow an implicit weak vetting mechanism that doesn't exist when funding people directly.

Comment by ofer on Severe Depression and Effective Altruism · 2019-03-30T14:51:23.138Z · score: 3 (2 votes) · EA · GW

Just an additional point to consider:

If you (and therefore other people similar to you) decide to act in a way that causes a lot of harm/suffering to yourself or your family, and you wouldn't have acted in that way had you never heard about EA, then that would create a causal link between "Alice learns about EA" and "Alice or her family suffer". From a utilitarian perspective, such a causal link seems extremely harmful (e.g. making it less likely that a random talented/rich person would end up being involved in EA related efforts).

So this is an argument in favor of NOT making such decisions.