Ought: why it matters and ways to help 2019-07-26T01:56:34.037Z · score: 52 (24 votes)
Donor lottery details 2017-01-11T00:52:21.116Z · score: 21 (21 votes)
Integrity for consequentialists 2016-11-14T20:56:27.585Z · score: 30 (32 votes)
What is up with carbon dioxide and cognition? An offer 2016-04-06T01:18:03.612Z · score: 10 (12 votes)
Final Round of the Impact Purchase 2015-12-16T20:28:45.709Z · score: 4 (6 votes)
Impact purchase round 3 2015-06-16T17:16:12.858Z · score: 3 (3 votes)
Impact purchase: changes and round 2 2015-04-20T20:52:29.894Z · score: 3 (3 votes)
$10k of Experimental EA Funding 2015-02-25T19:54:29.881Z · score: 19 (19 votes)
Economic altruism 2014-12-05T00:51:44.715Z · score: 5 (7 votes)
Certificates of impact 2014-11-11T05:22:42.438Z · score: 15 (13 votes)
On Progress and Prosperity 2014-10-15T07:03:21.055Z · score: 29 (29 votes)
The best reason to give later 2013-06-14T04:00:31.000Z · score: 0 (0 votes)
Giving now vs. later 2013-03-12T04:00:04.000Z · score: 0 (0 votes)
Risk aversion and investment (for altruists) 2013-02-28T05:00:34.000Z · score: 3 (3 votes)
Why might the future be good? 2013-02-27T05:00:49.000Z · score: 1 (1 votes)
Replaceability 2013-01-22T05:00:52.000Z · score: 0 (0 votes)


Comment by paul_christiano on Ought: why it matters and ways to help · 2019-07-29T16:35:01.505Z · score: 10 (6 votes) · EA · GW


Comment by paul_christiano on Age-Weighted Voting · 2019-07-15T15:45:14.972Z · score: 4 (2 votes) · EA · GW
I suspect many people responding to surveys about events which happened 10-30 years ago would be doing so with the aim of influencing the betting markets which affect near future policy.

It would be good to focus on questions for which that's not so bad, because our goal is to measure some kind of general sentiment in the future---if in the future people feel like "we should now do more/less of X" then that's pretty correlated with feeling like we did too little in the past (obviously not perfectly---we may have done too little 30 years ago but overcorrected 10 years ago---but if you are betting about public opinion in the US I don't think you should ever be thinking about that kind of distinction).

E.g. I think this would be OK for:

  • Did we do too much or too little about climate change?
  • Did we have too much or too little immigration of various kinds?
  • Were we too favorable or too unfavorable to unions?
  • Were taxes too high or too low?
  • Is compensating organ at market rates a good idea?

And so forth.

Comment by paul_christiano on Age-Weighted Voting · 2019-07-12T16:37:38.710Z · score: 68 (25 votes) · EA · GW

I like the goal of politically empowering future people. Here's another policy with the same goal:

  • Run periodic surveys with retrospective evaluations of policy. For example, each year I can pick some policy decisions from {10, 20, 30} years ago and ask "Was this policy a mistake?", "Did we do too much, or too little?", and so on.
  • Subsidize liquid prediction markets about the results of these surveys in all future years. For example, we can bet about people in 2045's answers to "Did we do too much or too little about climate change in 2015-2025?"
  • We will get to see market odds on what people in 10, 20, or 30 years will say about our current policy decisions. For example, people arguing against a policy can cite facts like "The market expects that in 20 years we will consider this policy to have been a mistake."

This seems particularly politically feasible; a philanthropist can unilaterally set this up for a few million dollars of surveys and prediction market subsidies. You could start by running this kind of poll a few times; then opening a prediction market on next year's poll about policy decisions from a few decades ago; then lengthening the time horizon.

(I'd personally expect this to have a larger impact on future-orientation of policy, if we imagine it getting a fraction of the public buy-in that would be required for changing voting weights.)

Comment by paul_christiano on Age-Weighted Voting · 2019-07-12T16:16:14.019Z · score: 31 (13 votes) · EA · GW
It would mitigate intertemporal inconsistency

If different generations have different views, then it seems like we'll have an same inconsistency when we shift power from one generation to the next regardless of when we do it. Under your proposal the change happens when the next generation turns 18-37, but doesn't seem to be lessened. For example, the brexit inconsistency would have been between 20 years ago and today rather than between today and 20 years from now, but it would have been just as large.

In fact I'd expect age-weighting to have more temporal inconsistency overall: in the status quo you average out idiosyncratic variation over multiple generations and swap out 1/3 of people every 20 years, while in your proposal you concentrate most power in a single generation which you completely change every 20 years.

Age and wisdom: [...] As a counterargument, crystallised intelligence increases with age and, though fluid intelligence decreases with age, it seems to me that crystallised intelligence is more important than fluid intelligence for informed voting. 

Another counterargument: older people have also seen firsthand the long-run consequences of one generation's policies and have more time to update about what sources of evidence are reliable. It's not clear to me whether this is a larger or smaller impact than "expect to live through the consequences of policies." I think folk wisdom often involves deference to elders specifically on questions about long-term consequences.

(I personally think that I'm better at picking policies at 30 than 20, and expect to be better still at 40.)

Comment by paul_christiano on Confused about AI research as a means of addressing AI risk · 2019-03-17T00:26:18.096Z · score: 6 (3 votes) · EA · GW

Consumers care somewhat about safe cars, and if safety is mostly an externality then legislators may be willing to regulate it, and there are only so many developers and if the moral case is clear enough and the costs low enough then the leaders might all make that investment.

At the other extreme, if you have no idea how to build a safe car, then there is no way that anyone is going to use a safe car no matter how much people care. Success is a combination of making safety easy and getting people to care / regulating / etc.

Here is the post I wrote about this.

If you have "competitive" solutions, then the required social coordination may be fairly mild. As a stylized example, if the leaders in the field are willing to invest in safety, then you could imagine surviving a degree of non-competitiveness in line with the size of their lead (though the situation is a bit messier than that).

Comment by paul_christiano on If slow-takeoff AGI is somewhat likely, don't give now · 2019-01-31T02:12:50.310Z · score: 13 (4 votes) · EA · GW
The current price of these companies is already determined by cutthroat competition between hyper-informed investors. If Warren Buffett or Goldman Sachs thinks the market is undervaluing these AI companies, then they'll spend billions bidding up the stock price until they're no longer undervalued.

That sounds like a nice world, but unfortunately I don't think that the market is quite that efficient. (Like the parent, I'm not going to offer any evidence, just express my view.)

You could reply, "then why ain'cha rich?" but it doesn't really work quantitatively for mispricings that would take 10+ years to correct. You could instead ask "then why ain'cha several times richer than you otherwise would be?" but lots of people are in fact several times richer than they otherwise would be after a lifetime of investment. It's not anything mind-blowing or even obvious to an external observer.

"Don't try to beat the market" still seems like a good heuristic, I just think this level of confidence in the financial system is misplaced and "hyper-informed" in particular is really overstating it. (As is "incredibly high prior" elsewhere.)

(ETA: I also agree that if you think you have a special insight about AI, there are likely to be better things to do with it.)

Comment by paul_christiano on If slow-takeoff AGI is somewhat likely, don't give now · 2019-01-31T02:05:04.328Z · score: 7 (2 votes) · EA · GW

The same neglect that potentially makes AI investments a good deal can also make AI philanthropy a better deal. If there is a huge AI boom, a prescient investment in AI companies might leave you with a larger share of the world economy---but you'll probably still be a much smaller share of total dollars directed at influencing AI.

That said, I do think this is a reasonable default thing to do with dollars if you are interested in the long term but unimpressed with the current menu of long-termist philanthropy (or expect to be better-informed in the future).

Comment by paul_christiano on Announcing an updated drawing protocol for the donor lotteries · 2019-01-25T18:20:31.614Z · score: 4 (3 votes) · EA · GW

Trusting doesn't seem so bad (probably a bit better than trusting IRIS, since IRIS isn't in the business of claiming to be non-manipulable). I don't know if they support arbitrary winning probabilities for draws, but probably there is some way to make it work.

(That does seem strictly worse than hashing powerball numbers though, which seem more trustworthy than and easier to get.)

Comment by paul_christiano on Announcing an updated drawing protocol for the donor lotteries · 2019-01-25T18:01:53.688Z · score: 2 (1 votes) · EA · GW

I'm not sure what the myriad of more responsible ways are. If you trust CEA to not mess with the lottery more than you trust IRIS not to change their earthquake reports to mess with the lottery, then just having CEA pick numbers out of a hat could be better.

It definitely seems like free-riding on some other public lottery drawing that people already trust might be better.

Comment by paul_christiano on Announcing an updated drawing protocol for the donor lotteries · 2019-01-25T17:54:59.160Z · score: 3 (2 votes) · EA · GW

There is plenty of entropy in the API responses, that's not the worst concern.

I think the most serious question is whether a participant can influence the lottery draw (e.g. by getting IRIS to change low order digits of the reported latitude or longitude).

Comment by paul_christiano on How to improve EA Funds · 2018-04-14T01:39:28.025Z · score: 4 (4 votes) · EA · GW

In general I feel like donor lotteries should be preferred as a default over small donations to EA funds (winners can ultimately donate to EA funds if they decide that's the best option).

What are the best arguments in favor of EA funds as a recommendation over lotteries? Looking more normal?

(Currently there are no active lotteries, this is not a recommendation for short-term donations.)

Comment by paul_christiano on Economics, prioritisation, and pro-rich bias   · 2018-01-06T20:23:52.817Z · score: 1 (1 votes) · EA · GW

This standard of betterness is all you need to conclude: "every inefficient outcome is worse than some efficient outcome."

Comment by paul_christiano on Economics, prioritisation, and pro-rich bias   · 2018-01-06T20:21:44.898Z · score: 2 (2 votes) · EA · GW

If they endorsed the view you say they do with respect to scalping, wouldn't they say "provided there was perfectly equitable distribution of incomes, scalping ensures that goods go to those who value them most". Missing out the first bit gives an extremely misleading impression of their view, doesn't it?

When economists say "how much do you value X" they are usually using the dictionary definition of value as "estimate the monetary worth." Economists understand that valuing something involves an implicit denominator and "who values most" will depend on the choice of denominator. You get approximately the same ordering for any denominator which can be easily transferred between people, and when they say "A values X more than B" they mean in that common ordering. Economists understand that that sense of value isn't synonymous with moral value (which can't be easily transferred between people).

The reason that easily transferrable goods serve as a good denominator is because at the optimal outcome they should exactly track whatever the planner cares about (otherwise we could transfer them).

Expressing economists' actual view would take several additional sentences. The quote seems like a reasonable concise simplification.

Your version isn't true: an equitable distribution of incomes doesn't imply that everyone has roughly the same utility per marginal dollar. A closer formulation would be "Supposing that the policy-maker is roughly indifferent between giving a dollar to each person [e.g. as would be the case if the policy-maker has adopted roughly optimal policies in other domains, since dollars can be easily transferred between people] then scalping will ensure that the ticket goes to the person who the policy-maker would most prefer have it."

Immediately before your quote from Mankiw's book, he says "Equity involves normative judgments that go beyond the realm of economics and enter into the realm of political philosophy. We concentrate on efficiency as the social planner's goal. Keep in mind, however, that real policy-makers often care about equity as well." I agree the discussion is offensively simplified because it's a 101 textbook, but don't think this is evidence of fundamental confusion. If we read "equity" as "has the same marginal utility from a dollar" then this seems pretty in line with the utilitarian position.

Comment by Paul_Christiano on [deleted post] 2018-01-05T09:58:00.100Z

It's on my blog. I don't think the scheme works, and in general it seems any scheme introduces incentives to not look like a beneficiary. If I were to do this now, I would just run a prediction market on the total # of donations, have the match success level go from 50% to 100% over the spread, and use a small fraction of proceeds to place N buy and sell orders against the final book.

Comment by paul_christiano on Economics, prioritisation, and pro-rich bias   · 2018-01-03T18:11:59.212Z · score: 3 (3 votes) · EA · GW

Economists who accept your crucial premise would necessarily think that there should be no redistribution at all, since the net effect of redistribution is to move goods from people who were originally willing to pay more to people who were originally willing to pay less. But "redistribution is always morally bad" is an extreme outlier view amongst economists.

See for example the IGM poll on the minimum wage, where there is significant support for small increases to the minimum wage despite acknowledgment of the allocative inefficiency. The question most economists ask is "is this an efficient way to redistribute wealth? do the benefits justify the costs?" They don't consider the case settled because it decreases allocative efficiency (as it obviously does).

I don't think it would be that hard to find lots of examples of economists defending particular policies on the basis that those willing to pay more should get the good.

People can make that argument as part of a broader principle like "we should give goods to people who are willing to pay most, and redistribute money in the most efficient way we can."

For example, I also often argue that the people willing to pay more should get the good. But I don't accept your crucial premise even a tiny bit. The same is true of the handful of economists I've taken a class from or interacted with at length, and so I'd guess it's the most common view.

Comment by paul_christiano on Economics, prioritisation, and pro-rich bias   · 2018-01-03T18:04:02.967Z · score: 5 (5 votes) · EA · GW

Obviously what is optimal does depend on what we can compel the producer to do; if we can collect taxes, that will obviously be better. If we can compel the producer to suffer small costs to make the world better, there are better things to compel them to do. If we can create an environment in which certain behaviors are more expensive for the producer because they are socially unacceptable, there are better things to deem unacceptable. And so on.

More broadly, as a society we want to pick the most efficient ways to redistribute wealth, and as altruists we'd like to use our policy influence in the most efficient ways to redistribute wealth. Forcing the tickets to sell below market value is an incredibly inefficient way to redistribute wealth. So it can be a good idea in worlds where there are almost no options, but seems very unlikely to be a good idea in practice.

Comment by paul_christiano on Economics, prioritisation, and pro-rich bias   · 2018-01-03T09:24:04.046Z · score: 2 (2 votes) · EA · GW

In actual fact, they are appealing to preference utilitarianism. This is a moral theory.

Economists are quite often appealing to a much simpler account of betterness: if everyone prefers option A to option B, then option A is better than option B.

Comment by paul_christiano on Economics, prioritisation, and pro-rich bias   · 2018-01-03T09:13:52.706Z · score: 6 (6 votes) · EA · GW

Here is a stronger version of the pro-market-price argument:

  • The producer could sell a ticket for $1000 to Rich and then give $950 to Pete. This leaves both Rich and Pete better off, often very substantially.
  • In reality, Pete is not an optimal target for philanthropy, and so the producer could do even better by selling the ticket for $1000 to Rich and then giving to their preferred charity.
  • No matter what the producer wants, they can do better by selling the ticket at market price. And no matter what we want as advocates for a policy, we can do better by allowing them to. (In fact the world is complicated and it's not this clean, but that seems orthogonal to your objection.)

This is still not the strongest argument that can be made, but it's better than the argument from your crucial premise. I think there are few serious economists who accept your crucial premise in the way you mean it, though many might use it as a definition of welfare (but wouldn't consider total welfare synonymous with moral good).

Comment by paul_christiano on Announcing the 2017 donor lottery · 2017-12-22T04:58:47.521Z · score: 2 (2 votes) · EA · GW

What are the biggest upsides of transparency?

The actual value of the information produced seems modest.

Comment by paul_christiano on Announcing the 2017 donor lottery · 2017-12-18T06:41:14.972Z · score: 0 (0 votes) · EA · GW

You have diminishing returns to money, i.e. your utility vs. money curve is curved down. So a gamble with mean 0 has some cost to you, approximately (curvature) * (variance), that I was referring to as the cost-via-risk. This cost is approximately linear in the variance, and hence quadratic in the block size.

Comment by paul_christiano on Announcing the 2017 donor lottery · 2017-12-17T19:21:15.000Z · score: 6 (8 votes) · EA · GW

A $200k lottery has about 4x as much cost-via-risk as a $100k lottery. Realistically I think that smaller sizes (with the option to lottery up further) are significantly better than bigger pots. As the pot gets bigger you need to do more and more thinking to verify that the risk isn't an issue.

If you were OK with variable pot sizes, I think the thing to do would be:

  • The lottery will be divided up into blocks.
  • Each block will have have the same size, which will be something between $75k and $150k.
  • We provide a backstop only if the total donation is < $75k. Otherwise, we just divide the total up into chunks between $75k and $150k, aiming to be about $100k.
Comment by paul_christiano on Effective Altruism Grants project update · 2017-10-01T16:33:18.144Z · score: 2 (2 votes) · EA · GW

However, I suspect that this intuition was biased (upward), because I more often think in terms of "non-EA money". In non-EA money, CEA time would have a much higher nominal value. But if you think EA money can be used to buy good outcomes very cost-effectively (even at the margin) then $75 could make sense.

Normally people discuss the value of time by figuring out how many dollars they'd spend to save an hour. It's kind of unusual to ask how many dollars you'd have someone else spend so that you save an hour.

Comment by paul_christiano on Capitalism and Selfishness · 2017-09-16T03:17:54.292Z · score: 3 (3 votes) · EA · GW

Finally, capitalism requires a sufficiently self-interested culture such that it can sustain compounding capital accumulation through the sale of ever-greater commodities.

This is a common claim, but seems completely wrong. An economy of perfectly patient agents will accumulate capital much faster than a community that consumes 50% of its output. The patient agents will invest in infrastructure and technology and machines and so on to increase their future wealth.

The capitalists have to maximise productivity through technological innovation, wage repression, and so forth, or they are run into the ground and bankrupted by market competition

In an efficient market, the capitalists earn rents on their capital whatever they do.

Comment by paul_christiano on Nothing Wrong With AI Weapons · 2017-08-29T17:16:46.418Z · score: 9 (7 votes) · EA · GW

That sounds a lot more expensive than bullets. You can already kill someone for a quarter.

The main cost of killing someone with a bullet is labor. The point is that autonomous weapons reduce the labor required.

alter the balance of power between different types of groups in a specific way.

New technologies do often decrease the cost of killing people and increase the number of civilians who can be killed by a group of fixed size (see: guns, explosives, nuclear weapons).

Comment by paul_christiano on Nothing Wrong With AI Weapons · 2017-08-29T04:50:34.939Z · score: 10 (8 votes) · EA · GW

The two arguments I most often hear are:

  • Cheap autonomous weapons could greatly decrease the cost of ending life---within a decade they could easily be the cheapest form of terrorism by far, and may eventually be the cheapest mass destruction in general. Think insect-sized drones carrying toxins or explosive charges that are lethal if detonated inside the skull.

  • The greater the military significance of AI, the more difficult it becomes for states to share information and coordinate regarding its development. This might be bad news for safety.

Comment by paul_christiano on Blood Donation: (Generally) Not That Effective on the Margin · 2017-08-06T18:38:41.910Z · score: 7 (7 votes) · EA · GW

This seems to confuse costs and benefits, I don't understand the analysis. (ETA: the guesstimate makes more sense.)

I'm going to assume that a unit of blood is the amount that a single donor gives in a single session. (ETA: apparently a donation is 0.5 units of red blood cells. The analysis below is correct only if red blood cells are 50% of the value of a donation. I have no idea what the real ratio is. If red blood cells are most of the value, adjust all the values downwards by a factor of 2.)

The cost of donating a unit is perhaps 30 minutes (YMMV), and has nothing to do with 120 pounds. (The cost from having less blood for a while might easily dwarf the time cost, I'm not sure. When I've donated the time cost was significantly below 30 minutes.)

Under the efficient-NHS hypothesis, the value of marginal blood to the healthcare system is 120 pounds. We can convert this to QALYs using the marginal rate of (20,000 pounds / QALY), to get 0.6% of a QALY.

If you value all QALYs equally and think that marginal AMF donations buy them at 130 pounds / QALY, then your value for QALYs should be at most 130 pounds / QALY (otherwise you should just donate more). It should be exactly 130 pounds / QALY if you are an AMF donor (otherwise you should just donate less).

So 0.6% of a QALY should be worth about 0.8 pounds. If it takes 30 minutes to produce a unit of blood which is worth 0.6% of a QALY, then it should be producing value at 1.6 pounds / hour.

If the healthcare system was undervaluing blood by one order of magnitude, this would be 16 pounds / hour. So I think "would have to be undervaluing the effectiveness of blood donations by 2 orders of magnitude" is off by about an order of magnitude.

The reason this seems so inefficient has little to do with EA's quantitative mindset, and everything to do with the utilitarian perspective that all QALYs are equal. The revealed preferences of most EA's imply that they value their QALYs much more highly than those of AMF beneficiaries. Conventional morality suggests that people extend some of their concern for themselves to their peers, which probably leads to much higher values for marginal UK QALYs than for AMF beneficiary QALYs.

I think that for most EAs donating blood is still not worthwhile even according to (suitably quantitatively refined) common-sense morality. But for those who value their time at less than 20 pounds / hour and take the numbers in the OP seriously, I think that "common-sense" morality does strongly endorse donating blood. (Obviously this cutoff is based on my other quantitative views, which I'm not going to get into here).

(Note: I would not be surprised if the numbers in the post are wrong in one way or another, so don't really endorse taking any quantitative conclusions literally rather than as a prompt to investigate the issue more closely. That said, if you are able to investigate this question usefully I suspect you should be earning more than 20 pounds / hour.)

I'm very hesitant about EA's giving up on common-sense morality based on naive utilitarian calculations. In the first place, I don't think that most EA's moral reasoning is sufficiently sophisticated to outweigh simple heuristics like "when there are really big gains from trade, take them" (if society is willing to pay 240 pounds / hour for your time, and you value it at 16 pounds per hour, those are pretty big gains from trade). In the second place, even a naive utilitarian should be concerned that the rest of the world will be uncooperative with and unhappy with utilitarians if we are less altruistic than normal people in the ways that matter to our communities.

Comment by paul_christiano on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-11T16:04:41.197Z · score: 3 (3 votes) · EA · GW

On capability amplification:

MIRI's traditional goal would allow you to break cognition down into steps that we can describe explicitly and implement on transistors, things like "perform a step of logical deduction," "adjust the probability of this hypothesis," "do a step of backwards chaining," etc. This division does not need to be competitive, but it needs to be reasonably close (close enough to obtain a decisive advantage).

Capability amplification requires breaking cognition down into steps that humans can implement. This decomposition does not need to be competitive, but it needs to be efficient enough that it can be implemented during training. Humans can obviously implement more than transistors, the main difference is that in the agent foundations case you need to figure out every response in advance (but then can have a correspondingly greater reason to think that the decomposition will work / will preserve alignment).

I can talk in more detail about the reduction from (capability amplification --> agent foundations) if it's not clear whether it is possible and it would have an effect on your view.

On competitiveness:

I would prefer be competitive with non-aligned AI, rather than count on forming a singleton, but this isn't really a requirement of my approach. When comparing difficulty of two approaches you should presumably compare the difficulty of achieving a fixed goal with one approach or the other.

On reliability:

On the agent foundations side, it seems like plausible approaches involve figuring out how to peer inside the previously-opaque hypotheses, or understanding what characteristic of hypotheses can lead to catastrophic generalization failures and then excluding those from induction. Both of these seem likely applicable to ML models, though would depend on how exactly they play out.

On the ML side, I think the other promising approaches involve either adversarial training, ensembling / unanimous votes, which could be applied to the agent foundations problem.

Comment by paul_christiano on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-10T17:37:42.458Z · score: 10 (10 votes) · EA · GW

I agree with this basic point, but I think on the other side there is a large gap in concreteness that makes makes it much easier to usefully criticize my approach (I'm at the stage of actually writing pseudocode and code which we can critique).

So far I think that the problems in my approach will also appear for MIRI's approach. For example:

  • Solomonoff induction or logical inductors have reliability problems that are analogous to reliability problems for machine learning. So to carry out MIRI's agenda either you need to formulate induction differently, or you need to somehow solve these problems. (And as far as I can tell, the most promising approaches to this problem apply both to MIRI's version and the mainstream ML version.) I think Eliezer has long understood this problem and has alluded to it, but it hasn't been the topic of much discussion (I think largely because MIRI/Eliezer have so many other problems on their plates).
  • Capability amplification requires breaking cognitive work down into smaller steps. MIRI's approach also requires such a breakdown. Capability amplification is easier in a simple formal sense (that if you solve the agent foundations you will definitely solve capability amplification, but not the other way around).
  • I've given some concrete definitions of deliberation/extrapolation, and there's been public argument about whether they really capture human values. I think CEV has avoided those criticisms not because it solves the problem, but because it is sufficiently vague that it's hard to criticize along these lines (and there are sufficiently many other problems that this one isn't even at the top of the list). If you want to actually give a satisfying definition of CEV, I feel you are probably going to have to go down the same path that started with this post. I suspect Eliezer has some ideas for how to avoid these problems, but at this point those ideas have been subject to even less public discussion than my approach.

I agree there are further problems in my agenda that will be turned up by my discussion. But I'm not sure there are fewer such problems than for the MIRI agenda, since I think that being closer to concreteness may more than outweigh the smaller amount of discussion.

If you agree that many of my problems also come up eventually for MIRI's agenda, that's good news about the general applicability of MIRI's research (e.g. the reliability problems for Solomonoff induction may provide a good bridge between MIRI's work and mainstream ML), but I think it would also be a good reason to focus on the difficulties that are common to both approaches rather than to problems like decision theory / self-reference / logical uncertainty / naturalistic agents / ontology identification / multi-level world models / etc.

Comment by paul_christiano on My current thoughts on MIRI's "highly reliable agent design" work · 2017-07-08T16:05:16.373Z · score: 8 (8 votes) · EA · GW

You might think that "learning to reason from humans" doesn't accomplish (2) because it makes the AI human-limited. If we want an advanced AI to help us create the kind of world that humans would want "if we knew more, thought faster, were more the people we wished we were" etc. then the approval of actual humans might, at some point, cease to be helpful.

A human can spend an hour on a task, and train an AI to do that task in milliseconds.

Similarly, an aligned AI can spend an hour on a task, and train its successor to do that task in milliseconds.

So you could hope to have a sequence of nice AI's, each significantly smarter than the last, eventually reaching the limits of technology while still reasoning in a way that humans would endorse if they knew more and thought faster.

(This is the kind of approach I've outlined and am working on, and I think that most work along the lines of "learn from human reasoning" will make a similar move.)

Comment by Paul_Christiano on [deleted post] 2017-05-01T14:08:32.579Z

Scott links to this study, which is more convincing. They measure the difference between "physical mild (slap, spank)" and "physical harsh (use weapon, punch, kick)" punishment, with ~10% of children in the latter category. They consider children of twins to control for genetic confounders, and find something like a 0.2 SD effect on measures of behavioral problems at age 25. There is still confounding (e.g. households where parents beat their kids may be worse in other ways), and the effects are smaller and for rarer forms of punishment, but it is getting somewhere.

Comment by Paul_Christiano on [deleted post] 2017-05-01T00:22:04.938Z

The reported correlations between physical punishment and life outcomes, which underlie the headline $3.6 trillion / year figure, seem unlikely to be causal. I only clicked on the first study, but it made very little effort to control for any of the obvious confounders. (The two relevant controls are mother's education and presence of the father.) The confounding is sufficiently obvious and large that the whole exercise seems kind of crazy. On top of that, as far as I can tell, a causal effect of this size would be inconsistent with adoption studies.

It would be natural to either start with the effect on kids' welfare, which seems pretty easy to think about, or else make a much more serious effort to actually figure out the long-term effects.

Comment by paul_christiano on Utopia In The Fog · 2017-03-28T22:53:56.236Z · score: 4 (4 votes) · EA · GW

If you drop the assumption that the agent will be all-powerful and far beyond human intelligence then a lot of AI safety work isn't very applicable anymore, while it increasingly needs to pay attention to multi-agent dynamics

I don't think this is true in very many interesting cases. Do you have examples of what you have in mind? (I might be pulling a no-true-scotsman here, and I could imagine responding to your examples with "well that research was silly anyway.")

Whether or not your system is rebuilding the universe, you want it to be doing what you want it to be doing. Which "multi-agent dynamics" do you think change the technical situation?

the claim isn't that evolution is intrinsically "against" any particular value, it's that it's extremely unlikely to optimize for any particular value, and the failure to do so nearly perfectly is catastrophic

If evolution isn't optimizing for anything, then you are left with the agents' optimization, which is precisely what we wanted. I though you were telling a story about why a community of agents would fail to get what they collectively want. (For example, a failure to solve AI alignment is such a story, as is a situation where "anyone who wants to destroy the world has the option," as is the security dilemma, and so forth.)

Yes, or even implementable in current systems.

We are probably on the same page here. We should figure out how to build AI systems so that they do what we want, and we should start implementing those ideas ASAP (and they should be the kind of ideas for which that makes sense). When trying to figure out whether a system will "do what we want" we should imagine it operating in a world filled with massive numbers of interacting AI systems all built by people with different interests (much like the world is today, but more).

The point you are quoting is not about just any conflict, but the security dilemma and arms races. These do not significantly change with complete information about the consequences of conflict.

You're right.

Unsurprisingly, I have a similar view about the security dilemma (e.g. think about automated arms inspections and treaty enforcement, I don't think the effects of technological progress are at all symmetrical in general). But if someone has a proposed intervention to improve international relations, I'm all for evaluating it on its merits. So maybe we are in agreement here.

Comment by paul_christiano on Utopia In The Fog · 2017-03-28T16:34:18.160Z · score: 12 (12 votes) · EA · GW

It's great to see people thinking about these topics and I agree with many of the sentiments in this post. Now I'm going to write a long comment focusing on those aspects I disagree with. (I think I probably agree with more of this sentiment than most of the people working on alignment, and so I may be unusually happy to shrug off these criticisms.)

Contrasting "multi-agent outcomes" and "superintelligence" seems extremely strange. I think the default expectation is a world full of many superintelligent systems. I'm going to read your use of "superintelligence" as "the emergence of a singleton concurrently with the development of superintelligence."

I don't consider the "single superintelligence" scenario likely, but I don't think that has much effect on the importance of AI alignment research or on the validity of the standard arguments. I do think that the world will gradually move towards being increasingly well-coordinated (and so talking about the world as a single entity will become increasingly reasonable), but I think that we will probably build superintelligent systems long before that process runs its course.

The future looks broadly good in this scenario given approximately utilitarian values and the assumption that ems are conscious, with a large growing population of minds which are optimized for satisfaction and productivity, free of disease and sickness.

On total utilitarian values, the actual experiences of brain emulations (including whether they have any experiences) don't seem very important. What matters are the preferences according to which emulations shape future generations (which will be many orders of magnitude larger).

"freewheeling evolutionary developments, while continuing to produce complex and intelligent forms of organization, lead to the gradual elimination of all forms of being that we care about"

Evolution doesn't really select against what we value, it just selects for agents that want to acquire resources and are patient. This may cut away some of our selfish values, but mostly leaves unchanged our preferences about distant generations.

(Evolution might select for particular values, e.g. if it's impossible to reliably delegate or if it's very expensive to build systems with stable values. But (a) I'd bet against this, and (b) understanding this phenomenon is precisely the alignment problem!)

(I discuss several of these issues here, Carl discusses evolution here.)

Whatever the type of agent, arms races in future technologies would lead to opportunity costs in military expenditures and would interfere with the project of improving welfare. It seems likely that agents designed for security purposes would have preferences and characteristics which fail to optimize for the welfare of themselves and their neighbors. It’s also possible that an arms race would destabilize international systems and act as a catalyst for warfare.

It seems like you are paraphrasing a standard argument for working on AI alignment rather than arguing against it. If there weren't competitive pressure / selection pressure to adopt future AI systems, then alignment would be much less urgent since we could just take our time.

There may be other interventions that improve coordination/peace more broadly, or which improve coordination/peace in particular possible worlds etc., and those should be considered on their merits. It seems totally plausible that some of those projects will be more effective than work on alignment. I'm especially sympathetic to your first suggestion of addressing key questions about what will/could/should happen.

Not only is this a problem on its own, but I see no reason to think that the conditions described above wouldn’t apply for scenarios where AI agents turned out to be the primary actors and decisionmakers rather than transhumans or posthumans.

Over time it seems likely that society will improve our ability to make and enforce deals, to arrive at consensus about the likely consequences of conflict, to understand each others' situations, or to understand what we would believe if we viewed others' private information.

More generally, we would like to avoid destructive conflict and are continuously developing new tools for getting what we want / becoming smarter and better-informed / etc.

And on top of all that, the historical trend seems to basically point to lower and lower levels of violent conflict, though this is in a race with greater and greater technological capacity to destroy stuff.

I would be more than happy to bet that the intensity of conflict declines over the long run. I think the question is just how much we should prioritize pushing it down in the short run.

“the only way to avoid having all human values gradually ground down by optimization-competition is to install a Gardener over the entire universe who optimizes for human values.”

I disagree with this. See my earlier claim that evolution only favors patience.

I do agree that some kinds of coordination problems need to be solved, for example we must avoid blowing up the world. These are similar in kind to the coordination problems we confront today though they will continue to get harder and we will have to be able to solve them better over time---we can't have a cold war each century with increasingly powerful technology.

There is still value in AI safety work... but there are other parts of the picture which need to be explored

This conclusion seems safe, but it would be safe even if you thought that early AI systems will precipitate a singleton (since one still cares a great deal about the dynamics of that transition).

Better systems of machine ethics which don’t require superintelligence to be implemented (as coherent extrapolated volition does)

By "don't require superintelligence to be implemented," do you mean systems of machine ethics that will work even while machines are broadly human level? That will work even if we need to solve alignment prior long before the emergence of a singleton? I'd endorse both of those desiderata.

I think the main difference in alignment work for unipolar vs. multipolar scenarios is how high we draw the bar for "aligned AI," and in particular how closely competitive it must be with unaligned AI. I probably agree with your implicit claim, that they either must be closely competitive or we need new institutional arrangements to avoid trouble.

Rather than having a singleminded focus on averting a particular failure mode

I think the mandate of AI alignment easily covers the failure modes you have in mind here. I think most of the disagreement is about what kinds of considerations will shape the values of future civilizations.

both working on arguments that agents will be linked via a teleological thread where they accurately represent the value functions of their ancestors

At this level of abstraction I don't see how this differs from alignment. I suspect the details differ a lot, in that the alignment community is very focused on the engineering problem of actually building systems that faithfully pursue particular values (and in general I've found that terms like "teleological thread" tend to be linked with persistently low levels of precision).

Comment by paul_christiano on Donor lottery details · 2017-03-25T23:03:36.795Z · score: 4 (4 votes) · EA · GW

I owe Michael Nielsen $60k to donate as he pleases if [](] is between 0000000000... and 028F5C28F5... at noon PST on 2017/4/2.

Comment by paul_christiano on What Should the Average EA Do About AI Alignment? · 2017-03-01T02:27:27.967Z · score: 7 (7 votes) · EA · GW

You don't try to prevent nuclear disaster by making friendly nuclear missiles, you try to keep them out of the hands of nefarious or careless agents or provide disincentives for building them in the first place.

The difficulty of the policy problem depends on the quality of our technical solutions: how large an advantage can you get by behaving unsafely? If the answer is "you get big advantages for sacrificing safety, and a small group behaving unsafely could cause a big problem" then we have put ourselves in a sticky situation and will need to conjure up some unusually effective international coordination.

A perfect technical solution would make the policy problem relatively easy---if we had a scalable+competitive+secure solution to AI control, then there would be minimal risk from reckless actors. On the flip side, a perfect policy solution would make the technical problem relatively easy since we could just collectively decide not to build any kind of AI that could cause trouble. In reality we are probably going to need both.

(I wrote about this here.)

You could hold the position that the advantages from building uncontrolled AI will predictably be very low even without any further work. I disagree strongly with that and think that it contradicts the balance of public argument, though I don't know if I'd call it "easily corrected."

Comment by paul_christiano on Principia Qualia: blueprint for a new cause area, consciousness research with an eye toward ethics and x-risk · 2017-01-20T18:58:42.593Z · score: 1 (1 votes) · EA · GW

Ah, that makes a lot more sense, sorry for misinterpreting you. (I think Toby has a view closer to the one I was responding to, though I suspect I am also oversimplifying his view.)

I agree that there are important philosophical questions that bear on the goodness of building various kinds of (unaligned) AI, and I think that those questions do have impact on what we ought to do. The biggest prize is if it turns out that some kinds of unaligned AI are much better than others, which I think is plausible. I guess we probably have similar views on these issues, modulo me being more optimistic about the prospects for aligned AI.

I don't think that an understanding of qualia is an important input into this issue though.

For example, from a long-run ethical perspective, whether or not humans have qualia is not especially important, and what mostly matters is human preferences (since those are what shape the future). If you created a race of p-zombies that nevertheless shared our preferences about qualia, I think it would be fine. And "the character of human preferences" is a very different kind of object than qualia. These questions are related in various ways (e.g. our beliefs about qualia are related to our qualia and to philosophical arguments about consciousness), but after thinking about that a little bit I think it is unlikely that the interaction is very important.

To summarize, I do agree that there are time-sensitive ethical questions about the moral value of creating unaligned AI. This was item 1.2 in this list from 4 years ago. I could imagine concluding that the nature of qualia is an important input into this question, but don't currently believe that.

Comment by paul_christiano on Donor lotteries: demonstration and FAQ · 2016-12-31T03:32:26.957Z · score: 3 (3 votes) · EA · GW

It looks like the total will be around $50k, so I'm going to reduce the cut to 0.5%.

Comment by paul_christiano on Principia Qualia: blueprint for a new cause area, consciousness research with an eye toward ethics and x-risk · 2016-12-20T03:20:18.884Z · score: 2 (2 votes) · EA · GW

(effectively) prematurely settling on a utility function whose goodness depends heavily on the nature of qualia

This feels extremely unlikely; I don't think we have plausible paths to obtaining a non-negligibly good outcome without retaining the ability to effectively deliberate about e.g. the nature of qualia. I also suspect that we will be able to solve the control problem, and if we can't then it will be because of failure modes that can't be avoided by settling on a utility function. Of course "can't see any way it can happen" is not the same as "am justifiably confident it won't happen," but I think in this case it's enough to get us to pretty extreme odds.

More precisely, I'd give 100:1 against: (a) we will fail to solve the control problem in a satisfying war, (b) we will fall back to a solution which depends on our current understanding of qualia, (c) the resulting outcome will be non-negligibly good according to our view about qualia at the time that we build AI, and (d) it will be good because we hold that view about qualia.

(My real beliefs might be higher than 1% just based on "I haven't thought about it very long" and peer disagreement. But I think it's more likely than not that I would accept a bet at 100:1 odds after deliberation, even given that reasonable people disagree.)

(By non-negligibly good I mean that we would be willing to make some material sacrifice to improve its probability compared to a barren universe, perhaps of $1000/1% increase. By because I mean that the outcome would have been non-negligibly worse according to that view if we had not held it.)

I'm not sure if there is any way to turn the disagreement into a bet. Perhaps picking an arbiter and looking at their views in a decade? (e.g. Toby, Carl Schulman, Wei Dai?) This would obviously involve less extreme odds.

Probably more interesting than betting is resolving the disagreement. This seems to be a slightly persistent disagreement between me and Toby, I have never managed to really understand his position but we haven't talked about it much. I'm curious about what kind of solutions you see as plausible---it sounds like your view is based on a more detailed picture rather than an "anything might happen" view.

Comment by paul_christiano on Contra the Giving What We Can pledge · 2016-12-20T03:11:03.753Z · score: 5 (5 votes) · EA · GW

I think that donor lotteries are a considerably stronger argument than GiveWell for the claim "donating 10% doesn't have to be time-consuming."

Your argument (with GiveWell in place of a lottery) requires that either (a) you think that GiveWell charities are clearly the best use of funds, or (b) by "doesn't have to be time-consuming" you mean "if you don't necessarily want to do the most good." I don't think you should be confused about why someone would disagree with (a), nor about why someone would think that (b) is a silly usage.

If there were low-friction donor lotteries, I suspect that most small GiveWell donors would be better-served by gambling up to perhaps $1M and then thinking about it at considerably greater length. I expect a significant fraction of them would end up funding something other than GiveWell top charities.

(I was originally supportive but kind of lukewarm about donor lotteries, but I think I've now come around to Carl's level of enthusiasm.)

Comment by paul_christiano on Contra the Giving What We Can pledge · 2016-12-20T03:00:02.087Z · score: 2 (2 votes) · EA · GW

I assume this discussion is mostly aimed at people outside of CEA who are considering whether to take and help promote the pledge. I think there are many basic points which those people should probably understand but which CEA (understandably) isn't keen to talk about, and it is reasonable for people outside of CEA to talk about them instead.

I expect this discussion wasn't worth the time at any rate, but it seems like sharing it with CEA isn't really going to save time on net.

Comment by paul_christiano on Contra the Giving What We Can pledge · 2016-12-20T02:47:51.018Z · score: 6 (6 votes) · EA · GW

Secondly: An "evil future you" who didn't care about the good you can do through donations probably wouldn't care much about keeping promises made by a different kind of person in the past either, I wouldn't think.

[...] there's no point having a commitment device to prompt you to follow through on something you don't think you should do

Usually we promise to do something that we would not have done otherwise, i.e. which may not be in line with our future self's interests. The promise "I will do X if my future self wants to" is gratuitous.

When I promise to do something I will try to do it, even if my preferences change. Perhaps you are reading "evil" as meaning "lacks integrity" rather than "is not altruistic," but in context that doesn't make much sense.

It seems reasonable for GWWC to say that the GWWC pledge is intended more as a statement of intent than as a commitment; it would be interesting to understand whether this is how most people who come into contact with GWWC perceive the pledge. If there is systematic misperception, it seems like the appropriate response is "oops, sorry" and to fix the misperception.

Thirdly: The coordination thing doesn't really matter here because you are only 'cooperating' with your future self, who can't really reject you because they don't exist yet (unlike another person who is deciding whether to help you).

It does not seem to me that the main purpose of taking the GWWC pledge, nor its main effect, is to influence the pledger's behavior.

Comment by paul_christiano on Donor lotteries: demonstration and FAQ · 2016-12-11T00:08:18.392Z · score: 3 (3 votes) · EA · GW

Note that this is now being implemented by donation swapping, so small donors don't have to put in any extra work.

Comment by paul_christiano on Donor lotteries: demonstration and FAQ · 2016-12-08T18:35:02.150Z · score: 8 (8 votes) · EA · GW

If $50,000 gets contributed, it reduces my risk by a factor of 2, so I could halve the fee (and if $100k got contributed I could reduce it to zero). I'll probably do that.

Comment by paul_christiano on Donor lotteries: demonstration and FAQ · 2016-12-08T01:16:24.243Z · score: 1 (1 votes) · EA · GW

Also very negative expected value though.

Comment by paul_christiano on Donor lotteries: demonstration and FAQ · 2016-12-07T20:11:27.772Z · score: 6 (6 votes) · EA · GW

For some assets you could donate the upside while selling on the downside to take a loss, so it could actually be more tax-efficient. This probably requires a year of foresight though + rules out some options.

I think that in general lottery-ticket assets tend to be pretty expensive. Also, if the volatility is correlated with the market, then there are further problems from correlations between your returns and other donors' returns.

Comment by paul_christiano on Donor lotteries: demonstration and FAQ · 2016-12-07T20:07:05.126Z · score: 1 (1 votes) · EA · GW

The donations will be invested in equities (some US, some international, some emerging). Returns between the donation and the drawing seem like a very minor consideration, maybe you are looking at 0.5% in expectation, or more like 0.1% for investors with access to leverage.

Comment by paul_christiano on Integrity for consequentialists · 2016-12-03T04:43:17.293Z · score: 1 (1 votes) · EA · GW

Suppose I am considering saying something mean about someone in a context where they won't hear me, and I would be unwilling to say the same thing to their face. I have a hard time with this in general. But there are cases where it is OK according to this heuristic (when they'd be fine knowing that I would say that kind of thing about them under those conditions), and I think those are the cases that I endorse-on-reflection.

Comment by paul_christiano on Integrity for consequentialists · 2016-12-03T03:08:40.606Z · score: 5 (5 votes) · EA · GW

I apologize in advance if I'm a bit snarky.

The ideal utilitarian agent will simply always behave in the manner that optimizes expected future utility factoring in the effect that breaking one's word or other actions will have on the perceptions (and thus future actions) of other people

This view is not broadly accepted amongst the EA community. At the very least, this view is self-defeating in the following sense: such an "ideal utilitarian" should not try to convince other people to be an ideal utilitarian, and should attempt to become a non-ideal utilitarian ASAP (see e.g. Parfit's hitchhiker for the standard counterexample, though obviously there are more realistic cases).

However, the post gives us no reason to believe it's particular interpretation of integrity "being straightforward" is the best such heuristic. It merely asserts the author's belief that this somehow works out to be the best.

I argued for my conclusion. You may not buy the arguments, and indeed they aren't totally tight, but calling it "mere assertion" seems silly.

the very reason for considering integrity is that, "I find the ideal of integrity very viscerally compelling, significantly moreso than other abstract beliefs or principles that I often act on."

This is neither true, nor what I said.


This is what it looks like when something is asserted without argument.

I do agree roughly with this sentiment, but only if it is interpreted sufficiently broadly that it is consistent with my post.

Does that mean that instead of saying "I'll be there for you whatever happens" we should say "I'll be there for you as long as the balance of probability doesn't suggest that supporting you will cost more than 5 QALYs" (quality adjusted life years)?

I tried to spell out pretty explicitly what I recommend in the post, right at the beginning ("when I imagine picking an action, I pretend that picking it causes everyone to know that I am the kind of person who picks that option"), and it clearly doesn't recommend anything like this.

You seem to use "being straightforward" in a different way than I do. Saying "I'll be there for you whatever happens" is straightforward if you actually mean the thing that people will understand you as meaning.

Comment by paul_christiano on Integrity for consequentialists · 2016-11-14T23:15:44.932Z · score: 1 (3 votes) · EA · GW

You draw it as upward-sloping, but in your bullet-points you give reasons to believe that it would be downward-sloping.

The y axis is the cost of being a jerk, which is (presumably) higher if people are more likely to notice. In particular, it's not the cost of being perceived as a jerk, which (I argue) should be downward sloping.

(It seems like your other confusions about the graphs come from the same miscommunication, sorry about that.)

Also, let me clarify how a thought experiment works.

This is a post about how I think people ought to act in plausible situations. Thought experiments can cast light on that question to the extent they bear relevant similarities to plausible situations. The relationship between thought experiments and plausible situations becomes relevant if we are trying to make inferences about what we should do in plausible situations.

I agree that there are other philosophical questions that this post does not speak to.

And it seems unlikely to me that you will be able to find a universal rule for summarizing the right way to behave.

I agree that we won't be able to find universal rules. I tried to give a few arguments for why the correct behavior is less sensitive to context than you might expect, such that a simple approximation can be more robust than you would think. (I don't seem to have successfully communicated to you, which is OK. If these aspects of the post are also confusing to others then I may revise them in an attempt to clarify.)

Comment by paul_christiano on What is up with carbon dioxide and cognition? An offer · 2016-04-13T03:59:59.151Z · score: 0 (0 votes) · EA · GW

Anders made the following calculations...

The quoted calculation seems to assume that the indoor air is 100% CO2, when in fact it is about 0.1% CO2? So your conclusions seem to be off by a factor of 1000. Actually a factor of 5000 if you are trying to maintain 600ppm, since the outdoor air also has 400ppm and presumably the net flux is 0.

ETA: Actually maybe that was how you moved from l/hour to l/second, your figures seem about right for keeping levels at 700ppm assuming your airspeed.

Also a cracked window just doesn't seem to do it empirically.

My credence on the finding being true is high.

I had also seen the replication, and I believe that the paper correctly reports the result of an experiment. (And its certainly not publication bias with p < 0.0001 or whatever.) The question is whether a particular interpretation of the results is correct. At a minimum it depends on just what the test is measured.