Posts
Comments
I feel confused about how dangerous/costly it is to use LLMs for private documents or thoughts to assist longtermist research, in a way that may wind up in the training data for future iterations of LLMs. Some sample use cases that I'd be worried about:
- Summarizing private AI evals docs about plans to evaluate future models
- Rewrite emails on high-stakes AI gov conversations
- Generate lists of ideas for biosecurity interventions that can be helped/harmed by AI
- Scrub potentially risky/infohazard-y information from a planned public forecasting questions
- Summarize/rewrite speculations of potential near-future AI capabilities gains.
I'm worried about using LLMs for the following reasons:
- Standard privacy concerns/leakage to dangerous (human) actors
- If it's possible to back out your biosecurity plans from the models, this might give terrorists/gov'ts ideas.
- your infohazards might leak
- People might (probabilistically) back out private sensitive communication, which could be embarassing
- I wouldn't be surprised if care for consumer privacy at AGI labs for chatbot consumers is much lower than say for emails hosted by large tech companies
- I've heard rumors to this effect, also see
- I wouldn't be surprised if care for consumer privacy at AGI labs for chatbot consumers is much lower than say for emails hosted by large tech companies
- (unlikely) your capabilities insights might actually be useful for near-future AI developers.
- Training models in an undesirable direction:
- Give pre-superintelligent AIs more-realistic-than-usual ideas/plans for takeover
- Subtly bias the motivations of future AIs in dangerous ways.
- Perhaps leak capabilities gains ideas that allows for greater potential for self-improvement.
I'm confused whether these are actually significant concerns, vs pretty minor in the grand scheme of things. Advice/guidance/more considerations highly appreciated!
FWIW the former CEO of FTX US also claimed this:
In early April 2022, my eleventh month, I made one last try. I made a written formal complaint about what I saw to be the largest organizational problems inhibiting FTX’s future success. I wrote that I would resign if the problems weren’t addressed.
29/49 In response, I was threatened on Sam’s behalf that I would be fired and that Sam would destroy my professional reputation. I was instructed to formally retract what I’d written and to deliver an apology to Sam that had been drafted for me.
The threat model is still unclear, but this is at least somewhat corroborating evidence that Sam is not above using threats in such situations.
Thanks, appreciate the explanation!
Can you say which norms the current comment breaks? I think it was not clear to me upon reading both the comment, and looking at the forum norms again.
If almost all current leaders would be better than any plausible replacement, even after a significant hit to long-term effectiveness, then I think that says something about the leadership development pipeline that is worth observing.
I think it's relatively obvious that there's a dearth of competent leadership/management in EA. I think this is even more extreme for EA qua EA, since the personal costs : altruistic rewards tradeoff for EA qua EA work is arguably worse than e.g. setting up an AI governance initiative or leading a biosecurity project.
MacAskill thinks that (2) provides evidence that our time may be the most influential, but this evidence isn't strong enough to overcome the stronger arguments against this hypothesis.
I'm confused. Isn't the US Treasury covering this? Or are you suggesting that there might be a liquidity problem while things are getting sorted?
Minor compared to much more important points other people can be making, but highlighting this line:
At dinner the man bragged that Yudkowsky had modeled a core HPMOR professor on him.
Wow, this is an interesting framing on Yudkowsky writing him in as literal Voldemort.
Maybe there's a lesson about trustworthiness and interpersonal dynamics here somewhere.
I downvoted this post originally because it originally appeared to be about not criticizing people who are working on AI capabilities at large labs. Now that it's edited to be about not offering unsolicited criticism for people working on AI safety at large labs (with arguments about why we should avoid unsolicited criticism in general), I still disagree, but I've removed my downvote.
tbc I don't know any more than you here, and I only have the text of the comment to go off of. I just interpreted "You really don't need a [blip] from Russia to lead you into a discussion about some next shit that's about to blow in Silicon Valley . I'm pretty sure you can do it :)
Don't ask me, I'm an immigrant here." as referring to themselves. I found the rest of the comment kind of hard to understand so it's definitely possible I also misunderstood things here.
Feels kinda mean to tell a non-native speaker off for using a slur about their own group.
I dunno, a fairly central example in my mind is if an employee or ex-employee says mean and (by your perspective) wrong things about you online. Seems like if it wasn't for discomfort or awkwardness, replying to said employee would otherwise be a pretty obvious tool in the arsenal. Whereas you can't fire ex-employees and firing current employees is a) just generally a bad move and b) will make you look worse.
I mean in at least in global health and animal welfare, most of the time we don't evaluate charities for being net-negative, we only look at "other people's charities" that are already above a certain bar. I would be opposed to spending considerable resources looking at net negative charities in normal domains, most of your time is much better spent trying to triage resources to send to great projects and away from mediocre ones.
In longtermism or x-risk or meta, everything is really confusing so looking at net-positive vs net-negative becomes more compelling.
For what it's worth, it's very common at LTFF and other grantmakers to consider whether grants are net negative.
Also to be clear, you don't consider OpenAI to be EA-adjacent right? Because I feel like there are many discussions about OpenAI's sign over the years.
Standard management advice is that managers fire employees too slowly rather than too quickly.
I don't know if demanding answers makes sense, but I do think it's a pretty hard call whether Anthropic is net positive or net negative for AI safety; I'm surprised at the degree to which some people seem to think this question is obvious; I'm annoyed at the EA memeplex for not making this uncertainty more transparent to newcomers/outsiders; I hope not too many people join Anthropic for bad reasons.
A month has since passed, and tbh while my emotions has cooled down a bunch, I see no updates on a quick skim either here or Google. I'm sorry that the FLI team was put under such crossfires and extended EAF vitirol. However, I find myself confused about the grantmaking process that led to such a grant being almost approved. I think I still am more than a bit worried about the generative process that led to this situation, and can only hope that either a) there are exculpating circumstances at FLI that either can't be shared or FLI deprioritized sharing or b) FLI has quietly made changes to increase their grantmaking quality in the future, or decreased their willingness to give out grants until such changes have been made.
I think we're maybe talking past each other. E.g. I would not classify Thiel's political views as libertarian (I think he might have been at one point, but certainly not in the last 10+ years), and I'll be surprised if the median American or libertarian would. Some specific points:
Example, there are many EAs working in cryptocurrency and they tend to be libertarian
To be clear, the problem with SBF is that he stole billions of dollars. Theft is no less of a problem if it was in the traditional financial system.[1]
I do believe SBF donated large sums to Republicans.
Notably, not to the Libertarian Party!
Cryptocurrency, Race-IQ differences, and polyamory tend to be libertarian dominated areas of fascination.
Seems pretty unfalsifiable to me. Also kinda irrelevant.
But I don't really want to be speculating on these specific individuals political views, but make the broader point that those areas of itnerest are assosciated with libertarians.
Seems like an unusual framing of "to-date, all the major EA scandals have been caused by libertarians." Usually when I think (paraphrased)"X group of people caused Y" I don't think "X group of people have areas of interests in the vicinity of Y."
- ^
If anything, non-consensual redistribution is much more of a leftist thing than that of any other modern political strand?
I quite appreciate this comment, thank you!
I will point out that to-date, all the major EA scandals have been caused by libertarians (cryptocurrency, race science, sexual abuse in polyarmorous community).
Hmm this seems patently false to me?[1]. Am I misunderstanding something? If not, I'd appreciate it if people don't assert false things on the forum.
- ^
SBF was a major Democratic donor with parents who are Democratic donors. I doubt he ever identified as libertarian. Among the biggest critiques of Bostrom's academic views is that he seems too open to authoritarian survelliance (cf Vulnerable World Hypothesis), hardly a libertarian position. I don't know which incidences of "sexual abuse in polyarmorous community" you're referring to, but I suspect you're wrong there too.
Not eating bugs is a win! People who already aren't going to do this are not a group we need to reach.
Charity program will no longer be cancelled, according to Manifold Twitter.
Thanks, appreciate the update! <3
Thanks, appreciate the feedback. I didn't mean my comment as sarcastic and have retracted the comment. I had an even less charitable comment prepared but realized that "non-native speaker misunderstood what I said" is also a pretty plausible explanation given the international nature of this forum.
I might've been overly sensitive here, because the degree of misunderstanding and the sensitive nature of the topic feels reminiscent of patterns I've observed before on other platforms. This is one of the reasons why I no longer have a public Twitter.
I definitely agree that there might be other incidents that come to light. I still disagree that the presence of at least 5 incidents is much of an update that Time is underselling things.
Is English your native language? If not, I sometimes have trouble reading Mandarin texts and I found Google Translate to be okay. There might be better AI translation in the coming years as well.
Let me be more explicit:
Upon reading the Time article, I immediately assumed that whoever the article was talking about did other creepy things. Assuming the Time article did not misrepresent things hugely, the idea that the person (who we now know is Owen) has not done any other creepy things did not even cross my mind. I feel like this is an extremely normal, even boring, within-context reading.
On the other hand, when you said that "Context that makes Owen look worse" includes "Owen self-admittedly went on to make other inappropriate comments to people on 4 other occasions" this implies to me that your prior belief before reading Owen's statement was that whoever the Time article was referring to did not do other bad things, or at least did less bad things than say 4 other inappropriate comments of similar magnitude.
Because your reading appears to have differed so much from my own, I'm remarking on how this seems like a pretty odd prior to have, from my perspective.
???? I don't understand what your comment is trying to imply.
Owen himself claims that the culture of EA contributed to his sexual misconduct.
Regardless of my own views about which are the largest cultural problems in EA, what's your prior that people who do wrongdoing are accurate in their public assessment of factors that diminish their moral responsibility and/or make themselves look better? Your italicized bolding implies that you think this is an unusually reliable source of truth, whereas I pretty straightforwardly think it's unusually bad evidence.
Yeah I've heard elsewhere that NYT is pretty unusual here, would trust them less than other media.
I agree with that. But also, I don't think you necessarily need a model of bias or malfeasance by anybody else. If I was reading a statement/apology by someone who has zero power remaining in this community, I still would have significant doubts about its accuracy.
Owen self-admittedly went on to make other innapropriate comments to people on 4 other occasions (although they were self-judged to be less egregious).
Sorry, what was your prior belief here? Upon reading that section in the Time article, I definitely did not interpret (paraphrased) "telling a job interviewee staying at your house about your masturbation habits" as a one-off incident by someone who never otherwise does creepy things, and I doubt the average Time reader did.
EDIT: I'm confused about the disagree-votes. Did other people reading the Time article assume that it was a one-off incident before Owen's apology?
EDIT2: Fwiw I thought the rest of the comment that I replied to was a good contribution to the discourse, and I upvoted it before my comment.
There may yet be further events that haven't yet been reported to, or disclosed by Owen, and indeed, on the outside view, most events would not be suchly reported.
I want to highlight this. The more general thing to flag is that this is only Cotton-Barratt's side of the story, albeit apparently checked by several people. The prior is that at least some of this presentation to be slanted in his favor, subconsciously or otherwise.
I don't think it's reasonable to take either the facts or (especially) the framing of this story at face value without entertaining at least significant doubts, and I'm surprised at the number of commentators who appear to be doing this.
This seems good, both as reparation and as a reward for speaking up.
One example is how the New York Times decided that they wouldn't cover tech positively: https://twitter.com/KelseyTuoc/status/1588231892792328192
My understanding from those links is that NYT's actions here is a significant outlier in journalistic/editorial ethics, enough that both Kelsey and Matt thought it was relevant to comment on in those terms.
Kelsey:
I'd never heard anything like it[...]
For the record, Vox has never told me that my coverage of something must be 'hard-hitting' or must be critical or must be positive, and if they did, I would quit. Internal culture can happen in more subtle ways but the thing the NYT did is not normal.
Matt:
But what happened is that a few years ago the New York Times made a weird editorial decision with its tech coverage.
The literature on differential privacy might be helpful here. I think I may know a few people in the field, although none of them are close.
I think they're currently not planning to, see here.
For what it's worth, my current vote is for immediate suspension in situations if there is credible allegations for anyone in a grantmaking etc capacity where they used such powers in a retaliatory action for rejected romantic or sexual advances. In addition to being illegal, such actions are just so obviously evidence of bad judgement and/or poor self-control that I'd hesitate to consider anyone who acted in such ways a good fit for any significant positions of power. I have not thought the specific question much, but it's very hard for me to imagine any realistic situation where someone with such traits is a good fit for grantmaking.
I think posting under pseudonyms makes sense for EAs who are young[1], who are unsure what they want to do with their lives, and/or people who have a decent chance of wanting jobs that require discretion in the future, e.g. jobs in politics or government.
I know at least some governance people who likely regret being openly tied with Future Fund and adjacent entities after the recent debacle.
Also in general I'm confused about how the tides of what's "permissible internet dirt to dig up on people" will change in the future. Things might get either better or much worse, and in the worse worlds there's some option value left in making sure our movement doesn't unintentionally taint the futures of some extremely smart, well-meaning, and agentic young people.
That said, I personally prefer pseudonymous account names with a continued history like Larks or AppliedDivinityStudies[2], rather than anonymous accounts that draw attention to their anonymity, like whistleblower1984.
- ^
<22? Maybe <25, I'm not sure. One important factor to keep track of is how likely you are to dramatically change your mind in politically relevant ways, e.g. I think if you're currently a firm Communist it's bad to not tell your voters about it, but plenty of open-minded young people quickly go through phases of Communism then anarcho-capitalism then anarcho-socialism, etc, and depending on how the tides change maybe you don't want your blog musings while 19 to become too tied in your public identity.
- ^
A consideration against, and maybe a strong enough consideration to make my whole point moot, is the live possibility of much much better AI stylometry in the next decade.
I find myself pretty confused here, and can easily imagine that I screwed up in this assessment. I think the two main things that I find confusing is a) what standards should I expect to have for critics who're probably younger than 20[1], where I do in fact consider these norm violations to be quite bad if they came from people who are say older than 22, and b) how to relate to the very real possibility that I or people I know are doing bad things. Like humans have all sorts of biases to protect their in-group etc, and I can easily imagine both undercorrecting and overcorrecting here.
- ^
Which I'm not very calibrated about. You're much more calibrated than I am here, though for this specific question there are obvious reasons I shouldn't defer to you.
Hi, I think on balance I appreciate this post. This is a hard thing for me to say, as the post has likely caused nontrivial costs to some people rather close to me, and has broken some norms that I view as both subtle and important. But on balance I think our movement will do better with more critical thinkers, and more people with critical pushback when there is apparent divergence between stated memes and revealed goals.
I think this is better both culturally, and also is directly necessary to combat actual harm if there is also actual large-scale wrongdoing that agreeable people have been acculturated to not point out. I think it will be bad for the composition and future of our movement if we push away young people who are idealistic and disagreeable, which I think is the default outcome if posts like this only receive critical pushback.
So thank you for this post. I hope you stay and continue being critical.
I previously addressed this here.
I directionally agree with you. However, they do have a few other levers. For example, local EA groups can ban people based on information from CH. Grantmakers can also ask CH for consultation about people they hear concerning grapevine rumors about and outsource this side of investigations to them.
Some of this refers to what I refer to as "mandate" in my earlier shortform that I linked.
I agree that they can't make many decisions about private events, take legal action, or fire people they do not directly employ.
To give a concrete example, my (non-EA) ex was from Europe, and she had a relative who both didn't like that she had two partners, and that I was non-white. My understanding was that the "poly" dimension was seen as substantially worse than the racial dimension. The relative's attitude didn't particularly affect our relationship much (we both thought it was kind of funny). But at least in Western countries, I think your bar on outing poly people who don't want to be outed should be at least as high as your bar for outing interracial couples who don't want to be outed, given the relative levels of antipathy people in Western countries have between the two.
(I may want to delete this comment later).
Morality is hard in the best of times, and now is not the best of times. The movement may or may not be a good fit for you. I'm glad you're still invested in doing good regardless of perceived or actual wrongdoing of other members of the movement to date, and I hope I and others will do the same.
I guess I'm imagining from either Open Phil's perspective, or that of other large funders, the risk of value misalignment or incompetence of Open Phil staff is already priced in, and they've already paid the cost of evaluating Claire.
It's hard to imagine that (purely from the perspective of reducing costs of auditing)Holden or Cari or Dustin preferring an unknown quantity to Claire. There might be other good reasons to prefer having a more decentralized board[1], but this particular reason seems wrong.
Likewise, from the perspective of future employees or donors to EVF, the risk of value misalignment or incompetence of EVF's largest donor is already a cost they necessarily have to pay if they want to work for or fund EVF. So adding a board member (and another source of COI) that's not associated with Open Phil can only increase the number of COIs, not decrease it.
- ^
for example, a) you want a diversity of perspectives, b) you want to reduce the risks of being beholden to specific entities c) you want to increase the number of potential whistleblowers
Your argument here cuts against your prior comment.
Why was this comment downvoted?
Funnily enough, the "pigeon flu" example may cease to become a hypothetical. Pretty soon, we may need to look at the track record of various agencies and individuals to assess their predictions on H5N1.
I removed my upvote for the same reason.
Imagine a forecaster that you haven't previously heard of told you that there's a high probability of a new novel pandemic ("pigeon flu") next month, and their technical arguments are too complicated for you to follow.[1]
Suppose you want to figure out how much you want to defer to them, and you dug through to find out the following facts:
a) The forecaster previously made consistently and egregiously bad forecasts about monkeypox, covid-19, Ebola, SARS, and 2009 H1N1.
b) The forecaster made several elementary mistakes in a theoretical paper on Bayesian statistics
c) The forecaster has a really bad record at videogames, like bronze tier at League of Legends.
I claim that the general competency argument technically goes through for a), b), and c). However, for a practical answer on deference, a) is much more damning than b) or especially c), as you might expect domain-specific ability on predicting pandemics to be much stronger evidence for whether the prediction of pigeon flu is reasonable than general competence as revealed by mathematical ability/conscientiousness or videogame ability.
With a quote like
Hardly anyone associated with Future Fund saw the existential risk to… Future Fund, even though they were as close to it as one could possibly be.
I am thus skeptical about their ability to predict existential risk more generally, and for systems that are far more complex and also far more distant.
The natural interpretation to me is that Cowen (and by quoting him, by extension the authors of the post) is trying to say that FF not predicting the FTX fraud and thus "existential risk to FF" is akin to a). That is, a dispositive domain-specific bad forecast that should be indicative of their abilities to predict existential risk more generally. This is akin to how much you should trust someone predicting pigeon flu when they've been wrong on past pandemics and pandemic scares.
To me, however, this failure, while significant as evidence of general competency, is more similar to b). It's embarrassing and evidence of poor competence to make elementary errors in math. Similarly, it's embarrassing and evidence of poor competence to not successfully consider all the risks to your organization. But using the phrase "existential risk" is just a semantics game tying them together (in the same way that "why would I trust the Bayesian updates in your pigeon flu forecasting when you've made elementary math errors in a Bayesian statistics paper" is a bit of a semantics game).
EAs do not to my knowledge claim to be experts on all existential risks, broadly and colloquially defined. Some subset of EAs do claim to be experts on global-scale existential risks like dangerous AI or engineered pandemics, which is a very different proposition.
[1] Or, alternatively, you think their arguments are inside-view correct but you don't have a good sense of the selection biases involved.