For STS 10SI – Intro to AI Alignment in Winter 2023 I talked eliciting latent knowledge, cooperative inverse reinforcement learning, and shard theory. First I introduce the ELK problem as formulated by Paul Christiano and I go through the main proposals and counterexamples from the ELK document. Then we introduce inverse reinforcement learning, cooperative inverse reinforcement learning, and their relationship to the alignment problem. We proceed with a discussion of whether humans can be thought of as having utility functions, leading into a conversation about Shard theory. Lastly, we talk about similarities between the limbic-cortex relationship and the alignment problem, and frame utility functions as a story the brain tells about itself. Slides here.