Speaker: R. Srikant, Ph.D. Fredric G. and Elizabeth H. Nearing Endowed Professor, Dept of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign
Faculty Host: Srinivas Shakkottai, ECEN
Abstract: When the sizes of the state and action spaces are large, solving MDPs can be computationally prohibitive even if the probability transition matrix is known. So in practice, a number of techniques are used to approximately solve the dynamic programming problem, including lookahead, approximate policy evaluation using an m-step return, and function approximation. In a recent paper, Efroni et al. (2019) studied the impact of lookahead on the convergence rate of approximate dynamic programming. In this talk, the speaker will show that these convergence results change dramatically when function approximation is used in conjunction with lookout and approximate policy evaluation using an m-step return. Specifically, the speaker shows that when linear function approximation is used to represent the value function, a certain minimum amount of lookahead and multi-step return is needed for the algorithm to even converge. And when this condition is met, the speaker and his co-authors characterize the finite-time performance of policies obtained using such approximate policy iteration. Their results are presented for two different procedures to compute the function approximation: linear least-squares regression and gradient descent. Joint work with Anna Winnicki, Michael Livesay, and Joseph Lubars.
Biography: Dr. R. Srikant is the Fredric G. and Elizabeth H. Nearing Endowed Professor of ECE and the Coordinated Science Lab at UIUC. He is also one of two Co-Directors of the C3.ai Digital Transformation Institute, jointly headquartered at UIUC and Berkeley, which is a consortium of universities (Stanford, MIT, CMU, UChicago, Princeton, KTH, Berkeley and UIUC) and industries (C3.ai and Microsoft) aimed at promoting research on AI, ML, IoT and cloud computing for the betterment of society. Dr. Srikant’s research interests are in machine learning, communication networks and applied probability. He is the winner of the 2019 IEEE Koji Kobayashi Computers and Communications Award and the 2015 IEEE INFOCOM Achievement Award. He has won several best paper awards including the 2017 Applied Probability Society’s Best Publication Award, the 2015 INFOCOM Best Paper Award and the 2015 WiOpt Best Paper Award. He also won the Distinguished Alumnus Award from the Indian Institute of Technology, Madras in 2015
You can also click this link to join the seminar
For more information about TAMIDS seminar series, please contact Ms. Jennifer South at firstname.lastname@example.org