Ontology Indentification and Utility Functions


Date
Nov 2, 2023 12:00 PM — 12:00 PM
Location
450 Jane Stanford Way, Stanford, CA 94305

Abstract:

For STS 10SI – Intro to AI Alignment in Winter 2023 I talked eliciting latent knowledge, cooperative inverse reinforcement learning, and shard theory. First I introduce the ELK problem as formulated by Paul Christiano and I go through the main proposals and counterexamples from the ELK document. Then we introduce inverse reinforcement learning, cooperative inverse reinforcement learning, and their relationship to the alignment problem. We proceed with a discussion of whether humans can be thought of as having utility functions, leading into a conversation about Shard theory. Lastly, we talk about similarities between the limbic-cortex relationship and the alignment problem, and frame utility functions as a story the brain tells about itself. Slides here.

Scott Viteri
Scott Viteri
PhD Candidate in Computer Science

I am a PhD candidate at the Center for Automated Reasoning at Stanford University, under the guidance of Prof. Clark Barrett. My research focuses on improving the prosocial tendencies of language models (LMs) through a series of unique developmental approaches. This includes the introduction of communication channels during autoregressive training (akin to a kindergarten setting), allowing a parent LM to guide a child LM by curating its training data, and enhancing human feedback on LMs via a combined embedding of EEG data and speech.