How would a language model become goal-directed? 2022-07-16T14:50:15.588Z
What is meant by 'infrastructure' in EA? 2022-05-14T14:54:14.960Z
Short term feedback, long term consequences? 2022-03-25T18:57:55.702Z


Comment by David Mears on Software engineering help needed! (Effective Animal Advocacy web product) · 2022-09-22T10:28:46.946Z · EA · GW

Also, here’s another one, which also has the feature that you can search volunteer profiles there:

Comment by David Mears on Software engineering help needed! (Effective Animal Advocacy web product) · 2022-09-22T10:24:33.393Z · EA · GW

Not sure if you’ve already submitted this to the volunteering opportunities board:

That page has a link to an airtable form where you can submit this opportunity

Comment by David Mears on Software Engineer: what to do with 3 days of volunteering? · 2022-09-22T10:15:35.027Z · EA · GW

In general, you can become aware of these projects by joining the relevant Facebook and Discord groups. Please DM me for links.

Comment by David Mears on Software Engineer: what to do with 3 days of volunteering? · 2022-09-22T10:14:31.910Z · EA · GW

Just spotted this thing:

Comment by David Mears on Who's hiring? (May-September 2022) · 2022-09-14T16:23:03.631Z · EA · GW

Cross-posting a top-level post: AGI Safety Fundamentals programme is contracting a low-code engineer

TL;DR: Help the AGI Safety Fundamentals, Alternative Protein Fundamentals, and other programs by automating our manual work to support larger cohorts of course participants, more frequently.

Register interest here [5 mins, CV not required if you don’t have one].

Comment by David Mears on "Agency" needs nuance · 2022-09-13T13:20:46.474Z · EA · GW

If I were to guess what the 'disagreement' downvotes were picking up on, it would be this:

I see that as a definition driven by self-interest

Whereas to me, all of the adjectives 'proactive, ambitious, deliberate, goal-directed' are goal-agnostic, such that whether they end up being selfish or selfless depends entirely on what goal 'cartridge' you load into the slot (if you'll forgive the overly florid metaphor).

Comment by David Mears on "Agency" needs nuance · 2022-09-13T13:16:34.624Z · EA · GW

When I read the original OP that this OP is a response to, I am "reading in" some context or subtext based on the fact I know the author/blogger is an EA; something like "when giving life advice, I'm doing it to help you with your altruistic goals". As a result of that assumption, I take writing that looks like 'tips on how to get more of what you want' to be mainly justified by being about altruistic things you want.

Comment by David Mears on "Agency" needs nuance · 2022-09-12T23:20:13.865Z · EA · GW

As NinaR said, 'round these parts the word "agentic" doesn't imply self-interest. My own gloss of it would be "doesn't assume someone else is going to take responsibility for a problem, and therefore is more likely to do something about it". For example, if the kitchen at your workplace has no bin ('trashcan'), an agentic person might ask the office manager to get one, or even just order one in that they can get cheaply. Or if you see that the world is neglecting to consider the problem of insect welfare, instead of passively hoping that 'society will get its act together', you might think about what kind of actions would need to be taken by individuals for society to get its act together, and consider doing some of those actions.

Comment by David Mears on Changes to our top charity criteria, and a new giving option · 2022-08-17T23:16:07.847Z · EA · GW

Thanks for all you do.

I feel that changing the nature of the Maximum Impact Fund in this way should come with a renaming of the fund, since it is now no longer going all-out on expected value; whereas before it was "maximizing" expected "impact", it's no longer doing that. And many donors have come to expect that the MIF is the go-to for high EV donation, and will not notice this change.

Something like the 'Top Charities Fund' or 'High Impact Fund' flags the fundamental change, and is a bit less misleading.

Comment by David Mears on Longtermists Should Work on AI - There is No "AI Neutral" Scenario · 2022-08-12T12:21:50.570Z · EA · GW

You’re really sure that developing AGI is impossible

I don’t need to think this in order to think AI is not the top priority. I just need to think it’s hard enough that other risks dominate it. Eg I might think biorisk has a 10% chance of ending everything each century, and that risks from AI are at 5% this century and 10% every century after that. Then if all else is equal, such as tractability, I should work on biorisk.

Comment by David Mears on The Charlemagne Effect: The Longtermist Case For Neartermism · 2022-07-25T18:01:39.622Z · EA · GW

I think it’s worth mentioning that what you’ve said is not in conflict with a much-reduced-but-still-astronomically-large Charlemagne Effect: you’ve set an upper bound for the longterm effects of nearterm lives saved at <<<2 billion years, but that still leaves a lot of room for nearterm interventions to have very large long term effects by increasing future population size.

That argument refers to the exponential version of the Charlemagne Effect; but the logistic one survives the physical bounds argument. OP writes that they don’t consider the logistic calculation of their Charlemagne Effect totally damning, particularly if it takes a long time for population to stabilise:

However, note that even in the models where the Charlemagne Effect weakens, it is not necessarily completely irrelevant. In the logistic model I created, the Charlemagne Effect is about 10,000 times less strong — but on the humongous scales of future people, this wouldn’t necessarily disqualify TNIs from competition with TLIs. Under future population scenarios with a carrying capacity, the Charlemagne Effect will be disproportionately weaker the sooner we either reach or start oscillating around a carrying capacity.

If that happens in the very early days of humanity’s future (e.g. 10% or less of the way through), then the Charlemagne Effect will be much less important. But if it happens later, then the Charlemagne Effect will have mattered for a large chunk of our future and thus be important for our future as a whole.

Comment by David Mears on The Charlemagne Effect: The Longtermist Case For Neartermism · 2022-07-25T17:29:49.201Z · EA · GW

We don’t need to be certain the UN projections are wrong to have some credence that world population will continue to grow. If that credence is big enough, the Charlemagne Effect can still win.

Comment by David Mears on How would a language model become goal-directed? · 2022-07-18T20:57:46.107Z · EA · GW

Thanks. I didn't understand all of this. Long reply with my reactions incoming, in the spirit of Socratic Grilling.

  1. They may imitate the behavior of a consequentialist.

This implies a jump by the language model from outputting text to having behavior. (A jump from imitating verbal behavior to imitating other behavior.) It's that very jump that I'm trying to pin down and understand.

2. They may be used to predict which actions would have given consequences, decision-transformer style ("At 8 pm X happened, because at 7 pm ____").

I can see that this could produce an oracle for an actor in the world (such as a company or person), but not how this would become such an actor. Still, having an oracle would be dangerous, even if not as dangerous as having an oracle that itself takes actions. (Ah - but this makes sense in conjunction with number 5, the 'outer loop'.)

3. A sufficiently powerful language model is expected to engage in some consequentialist cognition in order to make better predictions, and this may generalize in unpredictable ways.

'reasoning about how one's actions affect future world states' - is that an OK gloss of 'consequentialist cognition'? See comments from others attempting to decipher quite what this phrase means.

Interesting to posit a link from CC => making better predictions. I can see how that's one step closer to optimizing over future world states. The other steps seem missing - I take it they are meant to be covered by 'generalizing in unpredictable ways'?

Or did you mean something stronger by CC: goal-directed behaviour? In other words, that a very, very powerful language model would have learned from its training to take real-world actions in service of the goal of next-token prediction? This makes sense to me (though as you say it's speculative).

4. You can fine-tune language models with RL to accomplish a goal, which may end up selecting and emphasizing one of the behaviors above (e.g. the consequentialism of the model is redirected from next-word prediction to reward maximization; or the model shifts into a mode of imitating a consequentialist who would get a particularly high reward). It could also create consequentialist behavior from scratch.

I'd probably need more background knowledge to understand this. Namely, some examples of when LMs have been fine-tuned to act in service of goals. That sounds like it would cut the Gordian knot of my question by simply demonstrating the existence proof rather than answering the question with arguments.

5. An outer loop could use language models to predict the consequences of many different actions and then select actions based on their consequences.

This one is easy to understand :)

(And indeed we see plenty of examples.)

Where should I look for these?

Comment by David Mears on How would a language model become goal-directed? · 2022-07-18T18:06:55.569Z · EA · GW

Changed post to use 'goal-directed' instead of 'goal-seeking'.

Comment by David Mears on How would a language model become goal-directed? · 2022-07-16T21:28:05.501Z · EA · GW

I have read the alignment problem, the first few chapters of Superintelligence, seen one or two Rob Miles videos. My question is more the second one; I agree that technically GPT-3 already has a goal / utility function (to find the most highly predicted token, roughly), but it’s not an ‘interesting’ goal in that it doesn’t imply doing anything in the world.

Comment by David Mears on Co-Creation of the Library of Effective Altruism [Information Design] (1/2) · 2022-07-15T09:02:26.823Z · EA · GW

I would lean away from encouraging people to all read the same books, for intellectual diversity reasons. I think there's great value in having different people read many different books, and then bringing a fresh perspective. Things like the scratch-off poster idea go too far in the direction of creating a canon, where each book is not particularly thoroughly vetted in any case, and leans towards promoting existing bestsellers rather than hidden gems.

How do 'book recommendations' fit into this 'diversity' stance? My feeling is that book recommendations should arise organically, rather than be centrally organised, since the former is firstly more adaptable (evolvable), and secondly allows for much more thorough vetting (people generally have to read the book before recommending it to a friend).

Comment by David Mears on Kurzgesagt - The Last Human (Longtermist video) · 2022-07-01T19:48:53.268Z · EA · GW

Great point. I wonder what’s the role behind a video that has the most influence on its success - the scriptwriter? The visual designer? Or someone a bit more zoomed out?

Comment by David Mears on EA London Tech Meetup · 2022-07-01T13:51:33.061Z · EA · GW

Facebook event:

Comment by David Mears on The biggest risk of free-spending EA is not optics or motivated cognition, but grift · 2022-05-29T10:17:34.352Z · EA · GW

Good post. Interested in ideas for how to guard against this. I notice that some orgs have a strong filter for 'value alignment' when hiring. I guess anti-grift detection should form part of this, but don't know what that looks like.

Comment by David Mears on The Inner Ring [Crosspost] · 2022-05-15T14:49:22.237Z · EA · GW

It’s a great speech - when you’ve time it’s maybe worth adding a summary so people know if they want to read it.

Tl;dr Social status is very attractive to some people, maybe most, but you should know that it’s not as rewarding ultimately as it appears, due to a kind of hedonic treadmill. Relevance to EA: notice when you are making decisions based on how it will get you access to The Cool People rather than on what you value

Comment by David Mears on EA and the current funding situation · 2022-05-14T14:55:06.624Z · EA · GW

I turned this into a question. Maybe someone will answer:

Comment by David Mears on What is meant by 'infrastructure' in EA? · 2022-05-14T14:54:43.467Z · EA · GW

I have trouble parsing what people mean when they say this word, and so do others.

Comment by David Mears on Deferring · 2022-05-14T11:24:15.931Z · EA · GW

I’m not sure what you mean by ‘bandwidth’, each time you use it.

Comment by David Mears on Companies with the most EAs and those with the biggest potential for new Workplace Groups · 2022-05-05T14:33:53.658Z · EA · GW

You might also consider using Swapcard (the event app used for EA Globals) as a way to find companies with many EAs in them.

Comment by David Mears on There are currently more than 100 open EA-aligned tech jobs · 2022-04-28T14:12:37.367Z · EA · GW

This post feels relevant: Dust off your CVs for the Big EA Hiring Round

Comment by David Mears on Future Matters #0: Space governance, future-proof ethics, and the launch of the Future Fund · 2022-04-01T10:40:29.331Z · EA · GW

I couldn't find the podcast on CastBox (which I use). I see that it's the 15th most popular medium for listening to podcasts in the US in 2019-20.

According to the data I was using, it seems Pandora and Audible are the objectively best mediums to target next with the podcast, but I have a vested interest in you allowing me to listen to it on the app I use.

Comment by David Mears on Leftism virtue cafe's Shortform · 2022-03-27T00:12:57.299Z · EA · GW

I've a truffle-cist, me

Comment by David Mears on Short term feedback, long term consequences? · 2022-03-26T00:58:51.350Z · EA · GW

It's a broad question, but I think feedback that isn't tied to 'real world' events is out of scope. I'm thinking of how for example a startup might make a product, put it in front of users, and then find out some need they hadn't thought of in advance.

Comment by David Mears on I’m Offering Free Coaching for Software Developers in the EA community · 2022-02-01T22:22:59.381Z · EA · GW

Talking with Yonatan has been extremely helpful to me. We've mainly communicated by Telegram voice notes and messages. He guided me through a jobhunt period, and helped me refine my plans, partly by giving feedback, partly by letting me ramble into a voice note until I had rubber-ducked myself into progress, and partly by introducing new frames for thinking about decisions I was making. It was useful to have someone to talk to who understood my motivations quite well (EA), but was at an objective distance (3000 miles), and who had relevant expertise and good thoughts. 

Some things I appreciate about Yonatan:

  • He asks good questions. Including tough questions.
  • Yonatan takes care to elicit my thoughts about things before offering thoughts of his own. This is good firstly because the exercise of advising myself turns up some good ideas and Yonatan is not omniscient, and secondly because copy-pasting other people's viewpoints without knowing how they got there is less useful (less generalizable and less debuggable) than imparting generative mental 'frameworks' or 'tools'.
    • One 'frame' that has stayed with me is the idea that in some situations, even if there is a small chance of someone being willing to grant your request, it might still be worth it to ask, because they might say yes, and if you don't ask, they definitely won't say yes.
    • Another example of a 'frame'. Yonatan caused me to pay much more attention to how I was feeling about things (strategies, decisions), in a quasi-therapeutic way, because he believes that my feelings carry useful information, which seems true.
  • We have an informal dialogue more than a didactic or one-sided dynamic.
  • He is keen to be told how he is doing, and how his thoughts are being received, in order to incorporate the feedback. He is keen to tailor the relationship to my needs and focus on the topics that are most important, even if they are not necessarily about technology.
  • I felt able to be unusually honest and open about my thoughts/motivations/fears/insecurities/shortcomings.

As a result of our conversations, I feel I approached my jobhunt-related decisions in a 10x more systematic way than I otherwise would have, and I have more mental models to make future decisions with.

In summary, I highly recommend talking to Yonatan, in case he can help you.

Background about me: I am a developer with 2 years experience, now in my second job.

Comment by David Mears on Lightning Talks: Life Hacks · 2022-01-08T21:10:25.631Z · EA · GW

The facebook and meetup events have been moved back by a week, but this event has not been changed. I am presuming that the fb and meetup events are correct?

Comment by David Mears on [deleted post] 2021-03-13T09:36:00.725Z

The problem of how to create feedback loops is much more difficult for organisations who focus on far-future (or even medium-future) outcomes. It's still worth trying to create some loops, as far as possible, and to tighten them.

Comment by David Mears on Careers Questions Open Thread · 2021-01-01T17:16:30.340Z · EA · GW

I'm yet another person who pivoted from having a linguistics degree to doing software development as a job - a relatively common path. (In between I tried to be a musician.) The transition was relatively easy: I did a 4-month bootcamp (Makers, London) in 2019. I think it's much easier to go the bootcamp route than the self-teaching route (assuming the bootcamp is good quality), because it's full-time, focuses on practical skills, and is verifiable by employers. (Also, they had a careers coach, and a money-back-if-you-don't-get-a-job guarantee, both of which helped.) It was much easier to be accepted onto a bootcamp than I originally assumed (I thought I'd have to spend months to years preparing for it, but that was totally wrong - just had to complete an online course).

Comment by David Mears on Contact us · 2021-01-01T17:14:43.019Z · EA · GW

Where would you like bug reports to go?

Comment by David Mears on Should EAs participate in the Double Up Drive? · 2019-12-24T19:46:06.807Z · EA · GW

This question has been answered to some extent by Aaron Gertler, regarding the 2018 version of Double Up Drive, here.

This year, it seems like the Drive turned out to be counterfactual for all money raised after $2.4 million, but not necessarily before (we don’t actually know).

Elsewhere on this forum, Aaron also said:

[...] finding out whether a match is actually counterfactual can be a really big deal for the community; I wish I'd worked harder to confirm with the Double Up Drive team whether their match was counterfactual (I think the answer turned out to be "yes", in which case I should have done more promotion, but I'm not actually sure).

I also asked an organiser (Dan Smith) about it directly, on Twitter. He said:

I do the best I can to make Double up Drive a true match. I personally only donate my funds to these causes when the drive fills out. I also think for donors that getting to choose which fund gets matched is impactful relative to other ways to donate.
Comment by David Mears on Are we living at the most influential time in history? · 2019-10-14T17:28:10.437Z · EA · GW

Would someone be willing to translate these sentences from philosophy/maths into English? Or let me know how I can work it out for myself?

That is: P(cards not shuffled)P(cards in perfect order | cards not shuffled) >> P(cards shuffled)P(cards in perfect order | cards shuffled), even if my prior credence was that P(cards shuffled) > P(cards not shuffled), so I should update towards the cards having not been shuffled.
Similarly, if it seems to me that I’m living in the most influential time ever, this gives me good reason to suspect that the reasoning process that led me to this conclusion is flawed in some way, because P(I’m reasoning poorly)P(seems like I’m living at the hinge of history | I’m reasoning poorly) >> P(I’m reasoning correctly)P(seems like I’m living at the hinge of history | I’m reasoning correctly).

I think this type of writing puts a very high accessibility bar on these sentences. I fall into the class of people who might be expected to understand these formalisms (I work in programming, a supposedly mathsy job).

Comment by David Mears on Why & How to Make Progress on Diversity & Inclusion in EA · 2018-05-17T17:00:12.789Z · EA · GW

I'm not really talking about showing how friendly you are

It looks like we were talking at cross purposes. I was picking up on the admittedly months-old conversation about "signalling collaborativeness" and [anti-]"combaticism", which is a separate conversation to the one on value signals. (Value signals are probably a means of signalling collaborativeness though.)

you should probably signal however friendly you are actually feeling

I think politeness serves a useful function (within moderation, of course). 'Forcing' people to behave more friendly than they feel saves time and energy.

I think EA has a problem with undervaluing social skills such as basic friendliness. If a community such as EA wants to keep people coming back and contributing their insights, the personal benefits of taking part need to outweigh the personal costs.

Comment by David Mears on Sexual Violence Risk Reduction - Let's Do Tracking! · 2018-05-14T21:19:28.611Z · EA · GW

Yes. What I'm asking about is coordinating methodology. I think Kathy had been a point of contact for both things.

Comment by David Mears on Why & How to Make Progress on Diversity & Inclusion in EA · 2018-05-14T16:01:15.306Z · EA · GW

But I can control whether I am priming people to get accustomed to over-interpreting.

That sounds potentially important. Could you give an example of a failure mode?

Because my approach is not merely about how to behave as a listener. It's about speaking without throwing in unnecessary disclaimers.

Consider how my question "Could you give an example...?" reads if I didn't precede it with the following signal of collaborativeness: "That sounds potentially important." At least to me (YMMV), I would be like 15% less likely to feel defensive in the case where I precede it with such a signal, instead of leaping into the question -- which I would be likely (on a System 1y day) to read as "Oh yeah? Give me ONE example." Same applies to the phrase "At least to me (YMMV)": I'm chucking in a signal that I'm willing to listen to your point of view.

Those are examples of disclaimers. I argue these kinds of signals are helpful for promoting a productive atomsphere; do they fall into the category you're calling "unnecessary disclaimers"? Or is it only something more overt that you'd find counterproductive?

I take the point that different people have different needs with regards to this concern. I hope we can both steer clear of typical-minding everyone else. I think I might be particularly oversensitive to anything resembling conflict, and you are over on the other side of the bell curve in that respect.

Comment by David Mears on Sexual Violence Risk Reduction - Let's Do Tracking! · 2018-05-14T14:54:25.536Z · EA · GW

Since Kathy is sadly gone*, is there a potential new coordination point for coordinating our tracking methods? If you think it's best to coordinate privately, you can find me on Facebook (David Mears)


Comment by David Mears on Why & How to Make Progress on Diversity & Inclusion in EA · 2018-05-13T16:35:30.534Z · EA · GW

You only have control over your own actions: you can't control whether your interlocutor over-interprets you or not.

Your "right approach", which is about how to behave as a listener, is compatible with Michael_PJ's, which is about how to behave as a speaker: I don't see why we can't do both.