Posts

Reading the ethicists 2: Hunting for AI alignment papers 2022-06-06T15:53:06.779Z
[Link] Reading the ethicists: A review of articles on AI in the journal Science and Engineering Ethics 2022-05-18T21:06:58.519Z
Thoughts on AI Safety Camp 2022-05-13T07:47:54.252Z
(xPost LW) How to turn money into AI safety? 2021-08-27T12:47:41.069Z

Comments

Comment by Charlie Steiner on Reflective Equilibria and the Hunt for a Formalized Pragmatism · 2022-11-07T04:11:06.673Z · EA · GW

I think the intersection with recommender algorithms - both in terms of making them, and in terms of efforts to empower people in the face of them - is interesting.

Suppose you have an interface that interacts with a human user by recommending actions (often with a moral component) in reaction to prompting (voice input seems emotionally powerful here), and that builds up a model of the user over time (or even by collecting data about the user much like every other app). How do you build this to empower the user rather than just reinforcing their most predictable tendencies? How to avoid top-down bias pushed onto the user by the company / org making the app?

Comment by Charlie Steiner on Aligning AI with Humans by Leveraging Legal Informatics · 2022-09-19T23:07:25.865Z · EA · GW

Thanks for your thorough response, and yeah, I'm broadly on board with all that. I think learning from detailed text behind decisions, not just the single-bit decision itself, is a great idea that can leverage a lot of recent work.

I don't think that using modern ML to create a model of legal text is directly promising from an alignment standpoint, but by holding out some of your dataset (e.g. a random sample, or all decisions about a specific topic, or all decisions later than 2021), you can test the generalization properties of the model, and more importantly test interventions intended to improve those properties.

I don't think we have that great a grasp right now on how to use human feedback to get models to generalize to situations the humans themselves can't navigate. This is actually a good situation for sandwiching: suppose most text about a specific topic (e.g. use of a specific technology) is held back from the training set, and the model starts out bad at predicting that text. Could we leverage human feedback from non-experts in those cases (potentially even humans who start out basically ignorant about the topic) to help the model generalize better than those humans could alone? This is an intermediate goal that it would be great to advance towards.

Comment by Charlie Steiner on Aligning AI with Humans by Leveraging Legal Informatics · 2022-09-19T00:58:32.492Z · EA · GW

Presumably you're aware of various Dylan Hadfield-Menell papers, e.g. https://dl.acm.org/doi/10.1145/3514094.3534130 , https://dl.acm.org/doi/10.1145/3306618.3314258 ,  https://dl.acm.org/doi/10.1145/3514094.3534130 

And of course Xuan's talk ( https://www.lesswrong.com/posts/Cty2rSMut483QgBQ2/what-should-ai-owe-to-us-accountable-and-aligned-ai-systems )

But, to be perfectly honest... I think there's part of this proposal that has merit, and part of this proposal that might sound good to many people but is actually bad.

First, the bad: The notion that "Law is a computational engine that converts human values into legible directives" is wrong. Legibility is not an inherent property of the directives. It is a property of the directives with respect to  the one interpreting them, which in the case of law is humans. If you build an AI that doesn't try to follow the spirit of the law in a human-recognizable way, the law will not be legible in the way you want.

The notion that it would be good to build AI that humans direct by the same process that we currently create laws is wrong. Such a process works for laws, specifically for laws for humans, but the process is tailored to the way we currently apply it in many ways large and small, and has numerous flaws even for that purpose (as you mention, about expressions of power).

Then, the good: Law offers a lot of training data that directly bears on what what humans value, what vague statements of standards mean in practice, and what humans think good reasoning looks like. The "legible" law can't be used directly, but it can be used as a yardstick against which to learn the illegible spirit of the law. This research direction does not look like a Bold New Way to do AI alignment, instead it looks like a Somewhat Bold New Way to apply AI alignment work that is fully contiguous with other alignment research (e.g. attempts to learn human preferences by actively asking humans).

Comment by Charlie Steiner on Impact Markets: The Annoying Details · 2022-07-15T08:38:16.668Z · EA · GW

One thing that confused me was the assumption at various points that the oracle is going to pay out the entire surplus generated. That'll get the most projects done, but it will have bad results because you'll have spent the entire surplus on charity yachts.

The oracle should be paying out what it takes to get projects done. Not in the sense of labor theory of value, I mean that if you are having trouble attracting projects, payouts should go up, and if you have lots of competition for funding, payouts should go down.

This is actually a lot like a monopsony situation, where you can have unemployment (analogous to net-positive projects that don't get done) because the monopsonistic employer has a hard time paying those last few people what they want without having to raise wages for everyone else, eating into their surplus.

Comment by Charlie Steiner on Do you want to work in the new Boston EA office at Harvard Square? · 2022-05-13T07:53:21.429Z · EA · GW

I was already strongly considering moving to Boston, so this makes me feel lucky :)

Comment by Charlie Steiner on FLI launches Worldbuilding Contest with $100,000 in prizes · 2022-01-21T22:10:56.100Z · EA · GW

Neat! Sadly I can't interact with the grants.futureoflife.org webpage yet because my "join the community" application is still sitting around.

Comment by Charlie Steiner on Enabling more feedback · 2021-12-11T16:25:13.695Z · EA · GW

I think moderated video calls are my favorite format, as boring as that is. I.e. you have a speaker and also a moderator who picks people to ask questions, cuts people off or prompts them to keep talking depending on their judgment, etc.

Another thing I like, if it seems like people are interested in talking about multiple different things after the main talk / QA / discussion, is splitting up the discussion into multiple rooms by topic. I think Discord is a good application for this. Zoom is pretty bad at this but can be cajoled into having the right functionality if you make everyone a co-host, I think Microsoft Teams is fine but other people have problems, and other people think GatherTown is fine but I have problems.

Comment by Charlie Steiner on Minimalist axiologies and positive lives · 2021-11-21T21:07:41.368Z · EA · GW

I'm curious about your takes on the value-inverted versions of the repugnant and very-repugnant conclusions. It's easy to "make sense" of a preference (e.g. for positive experiences) by deciding not to care about it after all, but doing that doesn't actually resolve the weirdness in our feelings about aggregation.

Once you let go of trying to reduce people to a 1-dimensional value first and then aggregate them second, as you seem to be advocating here in ss. 3/4, I don't see why we should try to hold onto simple rules like "minimize this one simple thing." If the possibilities we're allowed to have preferences about are not 1-dimensional aggregations, but are instead the entire self-interacting florescence of life's future, then our preferences can get correspondingly more interesting. It's like replacing preferences over the center of mass of a sculpture with preferences about its pose or theme or ornamentation.

Comment by Charlie Steiner on How to get more academics enthusiastic about doing AI Safety research? · 2021-09-05T03:49:11.420Z · EA · GW

Academics choose to work on things when they're doable, important, interesting, publishable, and fundable. Importance and interestingness seem to be the least bottlenecked parts of that list.

The root of the problem is difficulty in evaluating the quality of work. There's no public benchmark for AI safety that people really believe in (nor do I think there can be, yet - talk about AI safety is still a pre-paradigmatic problem), so evaluating the quality of work actually requires trusted experts sitting down and thinking hard about a paper - much harder than just checking if it beat the state of the art. This difficulty restricts doability, publishability, and fundability. It also makes un-vetted research even less useful to you than it is in other fields.

Perhaps the solution is the production of a lot more experts, but becoming an expertise on this "weird" problem takes work - work that is not particularly important or publishable, and so working academics aren't going to take a year or two off to do it. At best we could sponsor outreach events/conferences/symposia aimed at giving academics some information and context to make somewhat better evaluations of the quality of AI safety work.

Thus I think we're stuck with growing the ranks of experts not slowly per se (we could certainly be growing faster), but at least gradually, and then we have to leverage that network of trust both to evaluate academic AI safety work for fundability / publishability, and also to inform it to improve doability.

Comment by Charlie Steiner on Forecasting Transformative AI: Are we "trending toward" transformative AI? (How would we know?) · 2021-08-27T15:48:10.651Z · EA · GW

That's a good point. I'm a little worried that coarse-grained metrics like "% unemployment" or "average productivity of labor vs. capital" could fail to track AI progress if AI increases the productivity of labor. But we could pick specific tasks like making a pencil, etc. and ask "how many hours of human labor did it take to make a pencil this year?" This might be hard for diverse task categories like writing a new piece of software though.

Comment by Charlie Steiner on Forecasting Transformative AI: Are we "trending toward" transformative AI? (How would we know?) · 2021-08-27T13:41:17.731Z · EA · GW

What would a plausible capabilities timeline look like, such that we could mark off progress against it?

Rather than replacing jobs in order of the IQ of humans that typically end up doing them (the naive anthropocentric view of "robots getting smarter"), what actually seems to be happening is that AI and robotics develop capabilities for only part of a job at a time, but they do it cheap and fast, and so there's an incentive for companies/professions to restructure to take advantage of AI. Progressions of jobs eliminated is therefore going to be weird and sometimes ill-defined. So it's probably better to try to make a timeline of capabilities, rather than a timeline of doable jobs.

Actually, this probably requires brainstorming from people more in-touch with machine learning than me. But for starters, human-level performance on all current quantifiable benchmarks (from Allen Institute's benchmark of primary-school test questions [easy?] to Mine-RL BASALT [hard?]) would be very impressive.

Comment by Charlie Steiner on What are examples of technologies which would be a big deal if they scaled but never ended up scaling? · 2021-08-27T12:47:37.413Z · EA · GW

Scalability, or cost?

When I think of failure to scale, I don't just think of something with high cost (e.g. transmutation of lead to gold), but something that resists economies of scale.

Level 1 resistance is cost-disease-prone activities that haven't increased efficiency in step with most of our economy, education being a great example. Individual tutors would greatly increase results for students, but we can't do it. We can't do it because it's too expensive. And it's too expensive because there's no economy of scale for tutors - they're not like solar panels, where increasing production volume lets you make them more cheaply.

Level 2 resistance is adverse network effects - the thing actually becomes harder as you try to add more people. Direct democracy, perhaps? Or maintaining a large computer program? It's not totally clear what the world would have to be like for these things to be solvable, but it would be pretty wild; imagine if the difficulty of maintaining code scaled sublinearly with size!

Level 3 resistance is when something depends on a limited resource and if you haven't got it, you're out of luck. Stradivarius violins, perhaps. Or the element europium used in red-emitting phosphor for CRT tubes. Solutions to these, when possible, probably just look like better technology allowing a workaround.