Dylan Hadfield-Menell is an Assistant Professor on the faculty of Artificial Intelligence and Decision-Making in the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology (MIT). Dylan’s research focuses on the problem of agent alignment: the challenge of identifying behaviors that are consistent with the goals of another actor or group of actors. Dylan runs the Algorithmic Alignment Group, where they work to identify algorithmic solutions to alignment problems that arise from groups of AI systems, principal-agent pairs (i.e., human-robot teams), and societal oversight of ML systems.
Through this fellowship, Dylan will build AI systems that can manage uncertainty about rewards and adapt the support of the reward distribution in coordination with the system’s ability to influence the state of the world. Beyond the technical innovation required for this line of inquiry, this work is particularly important to Hard Problem 3 (Alignment) as it leads to interesting normative questions such as ‘what happens when AI agents evolve their judgment over time, just as concepts of “good” and “evil” evolve for individuals as they grow up, or in our species as a whole as it evolves.