Speaker: Dimitri Bertsekas, Ph.D., Fulton Professor of Computational Decision Making, Arizona State University, School of Computing & Augmented Intelligence
Faculty Host: P.R. Kumar, ECEN, Texas A&M University
Abstract: We focus on a new conceptual framework for approximate Dynamic Programming (DP) and Reinforcement Learning (RL). It revolves around two algorithms that operate in synergy through the powerful mechanism of Newton’s method, applied to Bellman’s equation for solving the underlying DP problem. These are the off-line training and the on-line play algorithms, and they are exemplified by the architectures of the AlphaZero program (which plays chess), the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon), and the model predictive control (MPC) architecture (which is central in control system design). We will aim to explain the beneficial effects of this synergism, and to provide a conceptual bridge between the sequential decision making cultures of artificial intelligence/RL, control theory/MPC, and discrete optimization/integer programming. The lecture is based on the 2022 book “Lessons from AlphaZero for Optimal, Model Predictive, and Adaptiv Control” (.pdf freely available on-line), and is supported by analysis from the earlier 2020 book “Rollout, Policy Iteration, and Distributed Reinforcement Learning.”
Biography: Dr. Dimitri Bertsekas is currently Fulton Professor of Computational Decision Making at the School of Computing and Augmented Intelligence at Arizona State University, Tempe, AZ. Previously, he has held faculty positions with the Engineering Economic Systems Dept., Stanford University (1971-1974) and the Electrical Engineering Dept. of the University of Illinois, Urbana (1974-1979). From 1979 to 2019 he was with the Electrical Engineering and Computer Science Department of M.I.T., where he served as McAfee Professor of Engineering. His research spans several fields, including optimization, control, and large-scale computation, and is closely tied to his teaching and book authoring activities. He has written numerous research papers, and nineteen books and research monographs, several of which are used as textbooks in MIT classes. Dr. Bertsekas was awarded the INFORMS 1997 Prize for Research Excellence in the Interface Between Operations Research and Computer Science, the 2001 ACC John R. Ragazzini Education Award, the 2009 INFORMS Expository Writing Award, the 2014 ACC Richard E. Bellman Control Heritage Award, the 2014 Khachiyan Prize for Life-Time Accomplishments in Optimization, the SIAM/MOS 2015 George B. Dantzig Prize, and the 2022 IEEE Control Systems Award. Together with his coauthor John Tsitsiklis, he was awarded the 2018 INFORMS John von Neumann Theory Prize, for the contributions of the research monographs “Parallel and Distributed Computation” and “Neuro-Dynamic Programming.” In 2001, he was elected to the National Academy of Engineering.
You can also click this link to join the seminar
For more information about TAMIDS tutorial series, please contact Ms. Jennifer South at email@example.com