I am a CS PhD candidate at the Center for AI Safety at Stanford University, under the guidance of Prof. Clark Barrett. I am interested in training AIs to be human-like both in terms of the concepts they contain, as well as the values they hold. I think the key to this lies in training the language model to produce its own memory during training and to influence its future training data. In this vein, I have been optimizing pre-trained models to produce Chain-of-Thought reasoning which is informative to itself and to humans. In the future, I am interested in using a similar blend of unsupervised learning and reinforcement learning towards active learning, in the hopes that this sets the preconditions to raise the AI like a child, and have that result in something child-like.
During my PhD, my focus has evolved from formal verification and programming languages to AI alignment. Prior to this, I majored in computer science and electrical engineering at MIT, contributing to AI and robotics research. After MIT, I explored interactive theorem proving at CMU with Simon Dedeo, publishing research on abduction in mathematics in the Cognition journal.
My primary character trait is curiosity, and I really love math.
Ph.D. in Computer Science (emphasis in Artificial Intelligence), present
Stanford University
B.S. in Computer Science and Electrical Engineering, 2018
Massachusetts Institute of Technology
Course Organizer
Course Organizer
Discussion Facilitator
Discussion Facilitator
For STS 10SI – Intro to AI Alignment in Winter 2023 I talked eliciting latent knowledge, cooperative inverse reinforcement learning, and shard theory. First I introduce the ELK problem as formulated by Paul Christiano and I go through the main proposals and counterexamples from the ELK document. Then we introduce inverse reinforcement learning, cooperative inverse reinforcement learning, and their relationship to the alignment problem. We proceed with a discussion of whether humans can be thought of as having utility functions, leading into a conversation about Shard theory. Lastly, we talk about similarities between the limbic-cortex relationship and the alignment problem, and frame utility functions as a story the brain tells about itself. Slides here.
A presentation on ontology maps as a framework for AI alignment, exploring how mappings between agent representations can be used to ensure AI systems retain human-compatible concepts and values.
How can we create intelligent systems that retain and expand into that which we find valuable in the universe? During this talk I will present my own thoughts based on ontology mapping, and I will communicate why I believe that mathematicians who think about systemic interactions are in an especially good place to answer this important question. I will start with a framework in which read-eval-print loops (repls) form a basis for reasoning about agents and computation in general. Then I will build ontology maps as an alignment framework on top of repls, and discuss its implications. Lastly I will invite discussion on what we might want out of the future, and talk about my thoughts on the central role of communication and its relation to both repls and ontology maps.