Now I want to see how much I like honey-drenched fat
I have children, and I would precommit to enduring the pain without hesitation, but I don’t know what I would do in the middle of experiencing the pain. If pain is sufficiently intense, “I” am not in chatter any more, and whatever part of me is in charge, I don’t know very well how it would act
I have the complete opposite intuition: equal levels of pain are harder to endure for equal time if you have the option to make them stop. Obviously I don’t disagree that pain for a long time is worse than pain for a short time.
This intuition is driven by experiences like: the same level of exercise fatigue is a lot easier to endure if giving up would cause me to lose face. In general, exercise fatigue is more distracting than pain from injuries (my reference points being a broken finger and a cup of boiling water in my crotch - the latter being about as distractingly painful as a whole bunch of not especially notable bike races etc).
Thinking a bit more: the boiling water actually was more intense for a few seconds, but after that it was comparable to bike racing. But also, all I wanted to do was run around shouting obscenities and given that I was doing exactly that I don’t recall the sense of being in conflict with myself, which is one of the things I find hard to deal with about pain.
I don’t know that this scales to very intense pain. The only pain experience I’ve had notable enough to recall years later was e when I ran 70km without having done very much running to train for it - it hurt a lot I don’t have any involuntary pain experiences that compare to it (running + lack of preparation was important here - I’ve done 400km bike rides with no especially notable pain). This was voluntary in the sense that I could have stopped and called someone to pick me up, but that would have disqualified my team.
One prediction I’d make is that holding my hand in an ice bucket with only myself for company would be much harder than doing it with other people where I’d be ashamed to be the first to pull it out. I don’t just mean I’d act differently - I mean I think I would actually experience substantially less psychological tension.
Conditional on AGI being developed by 2070, what is the probability that humanity will suffer an existential catastrophe due to loss of control over an AGI system?
Requesting a few clarifications:
- I think of existential catastrophes as things like near-term extinction rather than things like "the future is substantially worse than it could have been". Alternatively, I tend to think that existential catastrophe means a future that's much worse than technological stagnation, rather than one that's much worse than it would have been with more aligned AI. What do you think?
- Are we considering "loss of control over an AGI system" as a loss of control over a somewhat monolithic thing with a well-defined control interface, or is losing control over an ecosystem of AGIs also of interest here?
I think journalists are often imprecise and I wouldn't read too much into the particular synonym of "said" that was chosen.
Does it make more sense to think about all probability distributions that offers a probability of 50% for rain tomorrow? If we say this represents our epistemic state, then we're saying something like "the probability of rain tomorrow is 50%, and we withhold judgement about rain on any other day".
I think this question - whether it's better to take 1/n probabilities (or maximum entropy distributions or whatever) or to adopt some "deep uncertainty" strategy - does not have an obvious answer
Perhaps I’m just unclear what it would even mean to be in a situation where you “can’t” put a probability estimate on things that does as good as or better than pure 1/n ignorance.
Suppose you think you might come up with new hypotheses in the future which will cause you to reevaluate how the existing evidence supports your current hypotheses. In this case probabilistically modelling the phenomenon doesn’t necessarily get you the right “value of further investigation” (because you’re not modelling hypothesis X), but you might still be well advised to hold off acting and investigate further - collecting more data might even be what leads to you thinking of the new hypothesis, leading to a “non Bayesian update”. That said, I think you could separately estimate the probability of a revision of this type.
Similarly, you might discover a new outcome that’s important that you’d previously neglected to include in your models.
One more thing: because probability is difficult to work with, even if it is in principle compatible with adaptive plans, it might in practice tend to steer away from them.
Fair enough, she mentioned Yudkowsky before making this claim and I had him in mind when evaluating it (incidentally, I wouldn't mind picking a better name for the group of people who do a lot of advocacy about AI X-risk if you have any suggestions)
I skimmed from 37:00 to the end. It wasn't anything groundbreaking. There was
one incorrect claim ("AI safteyists encourage work at AGI companies"), I think her apparent moral framework that puts disproportionate weight on negative impacts on marginalised groups is not good, and overall she comes across as someone who has just begun thinking about AGI x-risk and so seems a bit naive on some issues. However, "bad on purpose to make you click" is very unfair.
But also: she says that hyping AGI encourages races to build AGI. I think this is true! Large language models at today's level of capability - or even somewhat higher than this - are clearly not a "winner takes all" game; it's easy to switch to a different model that suits your needs better and I expect the most widely used systems to be the ones that work the best for what people want them to do. While it makes sense that companies will compete to bring better products to market faster, it would be unusual to call this activity an "arms race". Talking about arms races makes more sense if you expect that AI systems of the future will offer advantages much more decisive than typical "first mover" advantages, and this expectation is driven by somewhat speculative AGI discourse.
She also questions whether AI safetyists should be trusted to improve the circumstances of everyone vs their own (perhaps idiosyncratic) priorities. I think this is also a legitimate concern! MIRI were at some point apparently aiming to 1) build an AGI and 2) use this AGI to stop anyone else building an AGI (Section A, point 6). If they were successful, that would put them in a position of extraordinary power. Are they well qualified to do that? I'm doubtful (though I don't worry about it too much because I don't think they'll succeed)
I think it's quite sensible that people hoping to have a positive impact in biosecurity should become well-informed first. However, I don't think this necessarily means that radical positions that would ban a lot of research are necessarily wrong, even if they are more often supported by people with less detailed knowledge of the field. I'm not accusing you of saying this, I just want to separate the two issues.
Many professionals in this space are scared and stressed. Adding to that isn’t necessarily building trust and needed allies. The professionals in this space are good people – no reputable virologist is trying to do research that intentionally releases or contributes to a pandemic. Biosafety professionals spend their life working to prevent lab leaks. If I’m being honest, many professionals in and around the biosecurity field don’t think incredibly highly of recent (the past few years) journalistic efforts and calls for total research bans.
Many people calling for complete bans think that scientists are unreliable on this - because they want to continue to do their work, and may not be experts in risk - and the fact that said scientists do not like people doing this doesn't establish that anyone calling for a complete ban is wrong to do so.
As a case in point regarding the unreliability of involved scientists: your reference number 6 repeatedly states that there is "no evidence for a laboratory origin of SARS-CoV-2", while citing arguments around the location of initial cases and phylogeny of SARS-CoV-2 as evidence for a zoonotic emergence. However, a survey of BSL-3 facilities in China found that 53% of associated coronavirus-related Nature publications were produced by Wuhan-based labs between 2017 and 2019, and it is extremely implausible that Wuhan bears 50% of the risk for novel zoonotic virus emergence in all of China! (it's possible that the authors of that survey erred - the do seem ideologically committed to the lab leak theory). Furthermore, I have to the best of my ability evaluated arguments about the presence of the furin cleavage site in the SARS-CoV-2 genome and my conclusion is that it is around 5 times as likely to be present in the lab origin scenario (accounting the fact that the WIV is an author on a proposal to insert such sites into SARS-like coronaviruses; also, I consider anywhere from 1.1 to 20 times as likely to be plausible). One can debate the relative strength of different pieces of evidence - and many have - but the claim that there is evidence on one side and none on the other is not plausible in my view, and I at least don't trust anyone making such a claim is able to capably adjudicate questions about risks of certain kinds of pathogen research.
(not that it's especially relevant, but I currently think the case for zoonosis is slightly stronger than the case for a lab leak, I just don't think you can credibly claim that there's no evidence that supports the lab leak theory)
A little bit of proof reading
You don’t likely don’t know more than professionals
You likely don't know
I do worry about it. Some additional worries I have are 1) if AI is transformative and confers strong first mover advantages, then a private company leading the AGI race could quickly become similarly powerful to a totalitarian government and 2) if the owners of AI depend far less on support from people for their power than today’s powerful organisations, they might be generally less benevolent than today’s powerful organisations
I think they do? Nate at least says he’s optimistic about finding a solution given more time
I'm not sold on how well calibrated their predictions of catastrophe are, but I think they have contributed a large number of novel & important ideas to the field.
The main point I took from video was that Abigail is kinda asking the question: "How can a movement that wants to change the world be so apolitical?" This is also a criticism I have of many EA structures and people.
I think it's surprising that EA is so apolitical, but I'm not convinced it's wrong to make some effort to avoid issues that are politically hot. Three reasons to avoid such things: 1) they're often not the areas where the most impact can be had, even ignoring constraints imposed by them being hot political topics 2) being hot political topics makes it even harder to make significant progress on these issues and 3) if EAs routinely took strong stands on such things, I'm confident it would lead to significant fragmentation of the community.
EA does take some political stances, although they're often not on standard hot topics: they're strongly in favour of animal rights and animal welfare, and were involved in lobbying for a very substantial piece of legislation recently introduced in Europe. Also, a reasonable number of EAs are becoming substantially more "political" on the question of how quickly the frontier of AI capabilities should be advanced.
Is the reason you don’t go back and forth about whether ELK will work in the narrow sense Paul is aiming for a) you’re seeking areas of disagreement, and you both agree it is difficult or b) you both agree it is likely to work in that sense?
My intuition for why "actions that have effects in the real world" might promote deception is that maybe the "no causation without manipulation" idea is roughly correct. In this case, a self-supervised learner won't develop the right kind of model of its training process, but the fine-tuned learner might.
I think "no causation without manipulation" must be substantially wrong. If it was entirely correct, I think one would have to say that pretraining ought not to help achieve high performance on a standard RLHF objective, which is obviously false. It still seems plausible to me that a) the self-supervised learner learns a lot about the world it's predicting, including a lot of "causal" stuff and b) there are still some gaps in its model regarding its own role in this world, which can be filled in with the right kind of fine-tuning.
Maybe this falls apart if I try to make it all more precise - these are initial thoughts, not the outcomes of trying to build a clear theory of the situation.
I think your first priority is promising and seemingly neglected (though I'm not familiar with a lot of work done by governance folk, so I could be wrong here). I also get the impression that MIRI folk believe they have an unusually clear understanding of risks, would like to see risky development slow down and are pessimistic about their near-term prospects for solving technical problems of aligning very capable intelligent systems and generally don't see any clearly good next steps. It appears to me that this combination of skills and views positions them relatively well for developing AI safety standards. I'd be shocked if you didn't end up talking to MIRI about this issue, but I just wanted to point out that from my point of view there seems to be a substantial amount of fit here.
If a model is deceptively aligned after fine-tuning, it seems most likely to me that it's because it was deceptively aligned during pre-training.
How common do you think this view is? My impression is that most AI safety researchers think the opposite, and I’d like to know if that’s wrong.
I’m agnostic; pretraining usually involves a lot more training, but also fine tuning might involve more optimisation towards “take actions with effects in the real world”.
All of these comments are focused on my third core argument. What do you think of the other two? They all need to be wrong for deceptive alignment to be a likely outcome.
Yeah, this is just partial feedback for now.
Recall that in this scenario, the model is not situationally aware yet, so it can't be deceptive. Why would making the goal long-term increase immediate-term reward? If the model is trying to maximize immediate reward, making the goal longer-term would create a competing priority.
I think I don't accept your initial premise. Maybe a model acquires situational awareness via first learning about how similar models are trained for object-level reasons (maybe it's an AI development assistant), and understanding about how these lessons apply to it's own training via a fairly straightforward generalisation (along the lines of "other models work like this, I am a model of a similar type, maybe I work like this too"). Neither of these steps requires an improvement in loss via reasoning about its own gradient updates.
If it can be deceptive, then making the goal longer term could help because it reasons from the goal back to performing well in training, and this might be replacing a goal that didn't quite do the right thing, but because it was short term it also didn't care about doing well in training.
This isn't necessarily true. Humans frequently plan for their future without thinking about how their own values will be affected and how that will affect their long-term goals. Why wouldn't a model do the same thing? It seems very plausible that a model could have crude long-term planning without yet modeling gradient descent updates
I agree it could go either way.
The relevant factor here is actually how much the model expects its future behavior to change from a gradient update, because the model doesn't yet know the effect of the upcoming gradient update.
I think our disagreement here boils down to what I said above: I'm imagining a model that might already be able to draw some correct conclusions about how it gets changed by training.
Gradient descent can update in every direction at once. If updating its proxies helps performance, I see no reason why gradient descent wouldn't update the proxies.
Right, that was wrong from me. I still think the broader conclusion is right - if goal shifting boosts performance, then it must already in some sense understand how to perform well and the goal shifting just helps it apply this knowledge. But I'm not sure if understanding how to perform well in this sense is enough to avoid deceptive alignment - that's why I wanted to read your first post (which I still haven't done).
Gradient descent can only update the model in the direction that improves performance hyper-locally. Therefore, building the effects of future gradient updates into the decision making of the current model would have to be advantageous on the current training batch for it to emerge from gradient descent.
I think the standard argument here would be that you've got the causality slightly wrong. In particular: pursuing long term goals is, by hypothesis, beneficial for immediate-term reward, but pursuing long term goals also entails considering the effects of future gradient updates. Thus there's a correlation between "better reward" and "considering future gradient updates", but the latter does not cause the former.
Because each gradient update should have only a small impact on model behavior, the relatively short-term reward improvements of considering these effects should be very small. If the model isn't being trained on goals that extended far past the next gradient update, then learning to consider how current actions affect gradient updates, which is not itself especially consequential, should be very slow.
It's not obvious to me that your "continuity" assumption generally holds ("gradient updates have only a small impact on model behaviour"). In particular, I have an intuition that small changes in "goals" could lead to large changes in behaviour. Furthermore, it is not clear to me that, granting the continuity assumption, the conclusion follows. I think the speed at which it learns to consider how current actions affect gradient updates should depend on how much extra reward (accounting for regularisation) is available from changing in other ways.
One line of argument is that if changing goals is the most impactful way to improve performance, then the model must already have a highly developed understanding of the world. But if it has a highly developed model of the world, then it probably already has a good "understanding of the base objective" (I use quotes here because I'm not exactly sure what this means).
When I click on the link to your first post, I am notified that I don't have access to the draft.
I think your title might be causing some unnecessary consternation. "You don't need to maximise utility to avoid domination" or something like that might have avoided a bit of confusion.
and I would urge the author to create an actual concrete situation that doesn't seem very dumb in which a highly intelligence, powerful and economically useful system has non-complete preferences
I'd be surprised if you couldn't come up with situations where completeness isn't worth the cost - e.g. something like, to close some preference gaps you'd have to think for 100x as long, but if you close them all arbitrarily then you end up with intrasitivity.
I wonder if it is possible to derive expected utility maximisation type results from assumptions of "fitness" (as in, evolutionary fitness). This seems more relevant to the AI safety agenda - after all, we care about which kinds of AI are successful, not whether they can be said to be "rational". It might also be a pathway to the kind of result AI safety people implicitly use - not that agents maximise some expected utility, but that they maximise utilities which force a good deal of instrumental convergence (i.e. describing them as expected utility maximisers is not just technically possible, but actually parsimonious). Actually, if we get the instrumental convergence then it doesn't matter a great deal if the AIs aren't strictly VNM rational.
In conclusion, I think we're interested in results like fitness -> instrumental convergence, not rationality -> VNM utility.
I largely endorse the position that a number of AI safety people have seen theorems of the latter type and treated them as if that they imply theorems of the former type.
Fixing the “I pit my evidence against itself” problem is easy enough once I’ve recognized that I’m doing this (or so my visualizer suggests); the tricky part is recognizing that I’m doing it.
One obvious exercise for me to do here is to mull on the difference between uncertainty that feels like it comes from lack of knowledge, and uncertainty that feels like it comes from tension/conflict in the evidence. I think there’s a subjective difference, that I just missed in this case, and that I can perhaps become much better at detecting, in the wake of this harsh lesson.
Something that helps me with problems like this is to verbalise the hypotheses I'm weighing up. Observing them seems to help me notice gaps.
That being said, polyamory/kink is very often used as a tool of social pressure by predators to force women into a bad choice of either a situation they would not have otherwise agreed to or being called “close minded” and potentially withheld social/career opportunities.
Are such threats believable? Is there a broader culture where people feel that they’re constantly under evaluation such that personal decisions like this are plausibly taken into account for some career opportunities, or is this something that arises mainly where the career opportunities are within someone’s personal fiefdom?
What you're saying here resonates with me, but I wonder if there are people who might be more inclined to assume they're missing something and consequently have a different feeling about what's going on when they're in the situation you're trying to describe. In particular, I'm thinking about people prone to imposter syndrome. I don't know what their feeling in this situation would be - I'm not prone to imposter syndrome - but I think it might be different.
I would have thought that "all conjectures" is a pretty natural reference class for this problem, and Laplace is typically used when we don't have such prior information - though if the resolution rate diverges substantially from the Laplace rule prediction I think it would still be interesting.
I think, because we expect the resolution rate of different conjectures to be correlated, this experiment is a bit like a single draw from a distribution over annual resolution probabilities rather than many draws from such a distribution ( if you can forgive a little frequentism).
I think to properly model Ord’s risk estimates, you have to account for the fact that they incorporate uncertainty over the transition rate. Otherwise I think you’ll overestimate the rate at which risk compounds over time, conditional on no catastrophe so far.
I think Gary Marcus seems to play the role of an “anti-AI-doom” figurehead much more than Timnit Gebru. I don’t even know what his views on doom are, but he has established himself as a prominent critic of “AI is improving fast” views and seemingly gets lots of engagement from the safety community.
I also think Marcus’ criticisms aren’t very compelling, and so the discourse they generate isn’t terribly valuable. I think similarly of Gebru’s criticism (I think it’s worse than Marcus’, actually), but I just don’t think it has as much impact on the safety community.
Some quick thoughts: A crude version of the vulnerable world hypothesis is “developing new technology is existentially dangerous, full stop”, in which case advanced AI that increase the rate of new technology development is existentially dangerous, full stop.
One of Bostroms solutions is totalitarianism. This seems to imply something like “new technology is dangerous, but this might be offset by reducing freedom proportionally”. Accepting this hypothesis seems to say that either advanced AI is existentially dangerous, or it accelerates a political transition to totalitarianism, which seems to be its own kind of risk.
What sort of substantial value would you expect to be added? It sounds like we either have a different belief about the value-add, or a different belief about the costs.
I'd be very surprised if the actual amount of big-picture strategic thinking at either organisation was "very little". I'd be less surprised if they didn't have a consensus view about big-picture strategy, or a clearly written document spelling it out. If I'm right, I think the current content is misleading-ish. If I'm wrong and actually little thinking has been done - there's some chance they say "we're focused on identifying and tackling near-term problems", which would be interesting to me given what I currently believe. If I'm wrong and something clear has been written, then making this visible (or pointing out its existence) would also be a useful update for me.
Polished vs sloppy
Here are some dimensions I think of as distinguishing sloppy from polished:
- Vague hunches <-> precise theories
- First impressions <-> thorough search for evidence/prior work
- Hard <-> easy to understand
- Vulgar <-> polite
- Unclear <-> clear account of robustness, pitfalls and so forth
All else equal, I don't think the left side is epistemically superior. It can be faster, and that might be worth it, but there are obvious epistemic costs to relying on vague hunches, first impressions, failures of communication and overlooked pitfalls (politeness is perhaps neutral here). I think these costs are particularly high in, as you say, domains that are uncertain and disagreement-heavy.
I think it is sloppy to stay too close to the left if you think the issue is important and you have time to address it properly. You have to manage your time, but I don't think there are additional reasons to promote sloppy work.
You say that there are epistemic advantages to exposing thought processes, and you give the example of dialogues. I agree there are pedagogical advantages to exposing thought processes, but exposing thoughts clearly also requires polish, and I don't think pedagogy is a high priority most of the time. I'd be way more excited to see more theory from MIRI than more dialogues.
If my reasoning process is actually flawed, then I want other EAs to be aware of that, so they can have an accurate model of how much weight to put on my views.
I don't think it's realistic to expect Lightcone forums to do serious reviews of difficult work. That takes a lot of individual time and dedication; maybe you occasionally get lucky, but you should mostly expect not to.
I agree that I'm not a paradigmatic example of the EAs who most need to hear this lesson [of exposing the thought process]; but I think non-established EAs heavily follow the example set by established EAs, so I want to set an example that's closer to what I actually want to see more of
Maybe I'll get into this more deeply one day, but I just don't think sharing your thoughts freely is a particularly effective way to encourage other people to share theirs. I think you've been pretty successful at getting the "don't worry about being polite to OpenAI" message across, less so the higher level stuff.
I don’t think this makes sense. Your group, in the EA community, regarding AI safety, gets taken seriously whatever you write. This in not the paradigmatic example of someone who feels worried about making public mistakes. A community that gives you even more leeway to do sloppy work is not one that encourages more people to share their independent thoughts about the problem. In fact, I think the reverse is true: when your criticisms carry a lot of weight even when they’re flawed, this has a stifling effect on people in more marginal positions who disagree with you.
If you want to promote more open discussion, your time would be far better spent seeking out flawed but promising work by lesser known individuals and pointing out what you think is valuable in it.
Am I correct in my belief that you are paid to do this work? If this is so, then I think the fact that you are both highly regarded and compensated for your time means your output should meet higher standards than a typical community post. Contacting the relevant labs is a step that wouldn’t take you much time, can’t be done by the vast majority of readers, and has a decent chance of adding substantial value. I think you should have done it.
We might just be talking past each other - I’m not saying this is a reason to be confident explosive growth won’t happen and I agree it looks like growth could go much faster before hitting any limits like this. I just meant to say “here’s a speculative mechanism that could break some of the explosive growth models”
I don’t think your summary is wrong as such, but it’s not how I think about it.
Suppose we’ve got great AI that, in practice, we still use with a wide variety of control inputs (“make better batteries”, “create software that does X”). Then it could be the case - if AI enables explosive growth in other domains - that “production of control inputs” becomes the main production bottleneck.
Alternatively, suppose there’s a “make me a lot of money” AI and money making is basically about making stuff that people want to buy. You can sell more stuff that people are already known to want - but that runs into the limit that people only want a finite amount of stuff. You could alternatively sell new stuff that people want but don’t know it yet. This is still limited by the number of people in the world, how often each wants to consider adopting a new technology and what things someone with life history X is actually likely to adopt and how long it takes them to make this decision. These things seem unlikely to scale indefinitely with AI capability.
This could be defeated by either money not being about making stuff people want - which seems fairly likely, but in this case I don’t really know what to think - or AI capability leading to (explosive?) human population expansion.
In defence of this not being completely wild speculation: advertising already comprises a nontrivial fraction of economic activity and seems to be growing faster than other sectors https://www.statista.com/statistics/272443/growth-of-advertising-spending-worldwide/
(Although only a small fraction of advertising is promoting the adoption of new tech)
One objection to the “more AI -> more growth” story is that it’s quite plausible that people still participate in an AI driven economy to the extent that they decide what they want, and this could be a substantial bottleneck to growth rates. Speeds of technological adoption do seem to have increased (https://www.visualcapitalist.com/rising-speed-technological-adoption/), but that doesn’t necessarily mean they can indefinitely keep pace with AI driven innovation.
I haven’t looked in detail at how Give Well evaluates evidence, so maybe you’re no worse here, but I don’t think “weighted average of published evidence” is appropriate when one has concerns about the quality of published evidence. Furthermore, I think some level of concern about the quality of published evidence should be one’s baseline position - I.e. a weighted average is only appropriate when there are unusually strong reasons to think the published evidence is good.
I’m broadly supportive of the project of evaluating impacts on happiness.
Eliezer’s threat model is “a single superintelligent algorithm with at least a little bit of ability to influence the world”. In this sentence, the word “superintelligent” cannot mean intelligence in the sense of definition 2, or else it is nonsense - definition 2 precludes “small or no ability to influence the world”.
Furthermore, in recent writing Eliezer has emphasised threat models that mostly leverage cognitive abilities (“intelligence 1”), such as a superintelligence that manipulates someone into building a nano factory using existing technology. Such scenarios illustrate that intelligence 2 is not necessary for AI to be risky, and I think Eliezer deliberately chose these scenarios to make just that point.
One slightly awkward way to square this with the second definition you link is to say that Yudkowsky uses definition 2 to measure intelligence, but is also very confident that high cognitive abilities are sufficient for high intelligence and therefore doesn’t always see a need to draw a clear distinction between the two.
I want to add: I've had a few similar experiences of being rudely dismissed where the person doing the rude dismissing was just wrong about the issue at hand. I mean, you, dear reader, obviously don't know whether they were wrong or I was wrong, but that's the conclusion I drew.
Furthermore, I think Gell-Mann amnesia is relevant here: the reason I'm so confident that my counterpart was wrong in these instances is because I happened to have a better understanding of the particular issues - but for most issues I don't have a better understanding than most other people. So this might be more common than my couple of experiences suggest.
I've had a roughly equal number of good experiences working with EAs, and overwhelmingly good experiences at conferences (EAGx Australia only).
As a brief addendum, I imagine in the non fraudulent world, Sam’s net worth is substantially smaller. So maybe the extremely fast growth of his wealth should itself be regarded with suspicion?
One counterfacutal I think is worth considering: had Sam never loaned customer deposits to Alameda, how do you think everyone should have acted?
Had the loans never happened, FTX would still have been engaged in some fairly disreputable business, Sam would still have a wildly high appetite for risk, and just about all of the "red flags" people bring up would still have been there. However, even if this was all common knowledge, my best guess is that most people would've readily endorsed continuing to work with FTX and would not have endorsed making bureaucratic requirements too onerous for FTX funded projects. I think, even in this counterfactual, it might still have made sense to insist on FTX improving their governance before they further scale up their engagement with EA (and perhaps a few other things too).
I suspect that factually, whatever people reasonably could have known was most likely limited to "disreputable business and red flags", not that the loans to Alameda had happened. Furthermore, I doubt anyone even had particularly good reason to think FTX might be engaged in outright fraud on this scale - I think crypto exchanges go bust for non-fraudulent reasons much more often than for fraudulent ones. For these reasons, I suspect that while there are improvements to be made, they probably won't amount to drastic changes. I also suspect that, despite numerous negative signs about FTX, even insiders would have been justified in placing relatively little credence in things playing out the way they have.
Why call it "buying time" instead of "persuading AI researchers"? That seems to be the direct target of efforts here, and the primary benefit seems better conceptualised as "AI researchers act in a way more aligned with what AI safety people think is appropriate" rather than "buying time" which is just one of the possible consequences.
Thanks for your thoughts. I agree that corporations and governments are pretty different, and their “motivations” are one major way in which they differ. I think you could dive deeply into these differences and how they affect the analogy between large human organisations and super intelligent machines, but I think that leads to a much longer piece. My aim was just to say that, if you’re trying to learn from this analogy, you should consider both governments and corporations.
I don’t know if this helps to explain my thinking but imagine you made contact with a sister Earth where there were no organisations larger than family groups. Some people asked you about forming larger organisations - they expected productivity benefits, but some were worried about global catastrophic risks that large human organisations might pose. I’m saying it would be a mistake to advise these people based on our experience with corporations alone, and we should also tell them about our experiences with governments.
(The example is a bit silly, obviously, but I hope it illustrates the kind of question I’m addressing)
I don’t endorse judging AI safety organisations by less wrong consensus alone - I think you should at least read the posts!
In this case, either the price finalises before the scan and no collapse happens, or it finalises after the scan and so the information from the scan is incorporated into the price at the time that it informs the decision. So as long as you aren’t jumping the gun and making decisions based on the non-final price, I don’t think this fails in a straightforward way.
But I’m really not sure whether or not it fails in a complicated way. Suppose if the market is below 50%, the coin is still flipped but tails pays out instead (I think this is closer to the standard scheme). Suppose both heads and tails are priced at 99c before the scan. After a scan that shows “heads”, there’s not much point to buy more heads. However, if you shorted tails and you’re able to push the price of heads very low, you’re in a great spot. The market ends up being on tails, and you profit from selling all those worthless tails contracts at 99c (even if you pay, say, 60c for them in order to keep the price above heads). In fact, if you’re sure the market will exploit this opportunity in the end, there is expected value in shorting both contracts before the scan - and this is true at any price! Obviously we shouldn’t be 100% confident it will be exploited. However, if both heads and tails trade for 99c prior to the scan then you lose essentially nothing by shorting both, and you therefore might expect many other people to also want to be short both and so the chance of manipulation might be high.
A wild guess: I think both prices close to $1 might be a strong enough signal of the failure of a manipulation attempt to outweigh the incentive to try.
Yeah, I’m also not sure. The main issue I see is whether we can be confident that the loser is really worse without randomising ( I don’t expect the price of the loser to accurately tell us how much worse it is).
Edit: turns out that this question has been partially addressed. They sort of say “no”, but I’m not convinced. In their derivation of incompatible incentives, they condition on the final price, but traders are actually going to be calculating an expectation over final prices. They discuss an example where, if the losing action is priced too low, there’s an incentive to manipulate the market to make that action win. However, the risk of such manipulation is also an incentive to correctly price the loser, even if you’re not planning on manipulation.
I think it definitely breaks if the consequences depend on the price and the choice (in which case, I think what goes wrong is that you can’t expect the market to convert to the right probability).
E.g. there is one box, and the market can open it () or not (). The choice is 75% determined by the market prices and 25% determined by a coin flip. A “powerful computer” (jokes) has specified that the box will be filled with $1m if the market price favours , and nothing otherwise.
So, whenever the market price favours , contracts are conditionally worth $1m (or whatever). However, contracts are always seemingly worthless, and as soon as contracts are worth more than they’re also worthless. There might be an equilibrium where gets bid up to $250k and to $250k-, but this doesn’t reflect the conditional probability of outcomes, and in fact is the better outcome in spite of its lower price.
I’m playing a bit loose with the payouts here, but I don’t think it matters.
I believe we agree on the following: we evaluate the desirability of each available option by appealing to some map from options to distributions over consequences of interest .
We also both suggest that maybe should be equal to the map where is the closing price of the decision market conditional on .
You say the price map is equal to the map , I say it is equal to where the expectation is with respect to some predictive subjective probability.
The reason why I make this claim is due to work like Chen 2009 that finds, under certain conditions, that prediction market prices reflect predicting subjective probabilities, and so I identify the prices with predictive subjective probabilities. I don’t think any similar work exists for potential outcomes.
The main question is: is the price map really the right function ? This is a famously controversial question, and causal decision theorists say: you shouldn’t always use subjective conditional probabilities to decide what to do (see Newcomb etc.) On the basis of results like Chen’s, I surmise that causal decision theorists at least don’t necessarily agree that the closing prices of the decision market defines the right kind of function, because it is a subjective conditional probability (but the devil might be in the details).
Now, let’s try to solve the problem with potential outcomes. Potential outcomes have two faces. On the one hand, is a random variable equal to in the event (this is called consistency). But there are many such variables - notably, itself. The other face of potential outcomes is that should be interpreted as representing a counterfactual variable in the event . What potential outcomes don’t come with is a precise theory of counterfactual variable. This is the reason for my “I know it when I see it” comment.
Here’s how you could argue that : first, suppose it’s a decision market with randomisation, so the choice is jointly determined by the price and some physical random signal . Assume - this is our “theory of counterfactual variables”. By determinism, we also have where Q is the closing price of the pair of markets. By contraction , and the result follows from consistency (apologies if this is overly brief). Then we also say is the function and we conclude that indeed .
This is nicer than I expected, but I figure you could go through basically the same reasoning, but with F directly. Assume and (and similarly for b). Then by similar reasoning we get (Noting that, by assumption, )
I’ll get back to you
The definition of potential outcomes you refer to does not allow us to answer the question of whether they are estimated by the market in question.
The essence of all the decision theoretic paradoxes is that everyone agrees that we need some function options -> distributions over consequences to make decisions, and no one knows how exactly to explain what that function is.
Phrasing it in terms of potential outcomes could definitely help the understanding of people who use that approach to talk about causal questions (which is a lot of people!). I’m not sure it helps anyone else, though. Under the standard account, the price of a prediction market is a probability estimate, modulo the assumption that utility = money (which is independent of the present concerns). So we’d need to offer an argument that conditional probability = marginal probability of potential outcomes.
Potential outcomes are IMO in the same boat as decision theories - their interpretation depends on a vague “I know it when I see it” type of notion. However we deal with that, I expect the story ends up sounding quite similar to my original comment - the critical step is that the choice does not depend on anything but the closing price.
a and b definitely are events, though! We could create a separate market on how the decision market resolves, and it will resolve unambiguously.