EVENT DETAILS
Abstract: Policy simulation -- i.e. producing a single state trajectory of a dynamical system controlled by some state dependent policy -- is often the core computational bottleneck in modern Reinforcement Learning algorithms. The multiple, inherently serial, policy evaluations that must be performed in one such simulation constitute the bulk of this bottleneck. As a concrete example, simulating a fulfillment optimization policy on a month's worth of demand at a moderately large retailer is a task that can take several hours.
We present a class of iterative algorithms we dub Picard Iteration. Our scheme carefully allocates policy evaluation tasks across independent (GPU) threads. Within each iteration a single thread only evaluates the policy on its assigned tasks while assuming a certain 'cached' evaluation for other tasks. This cache is updated at the end of the iteration. A single iteration is ideally suited for the type of parallelism offered by a GPU (essentially the batched application of a neural network) yielding 1000x+ speedups. Unfortunately, for general problems the number of iterations required for convergence will scale with the horizon of the simulation.
We prove and demonstrate empirically, that the structure afforded by inventory management problems allows Picard iteration to converge in a small number of iterations independent of the horizon. As one practical consequence, we demonstrate a 500x speedup in policy simulation for large-scale fulfillment optimization problems. Our scheme offers a blueprint for similar speedups in related policy simulation and sequential inference tasks.
Joint work with Joren Gijsbrechts, Aryan Khojandi, Tianyi Peng and Andy Zheng.
Bio: Vivek is interested in the development of new methodologies and applications for large scale dynamic optimization. He received his Ph.D. in Electrical Engineering from Stanford University in 2007 and is the Patrick J. McGovern (1959) Professor at MIT. Vivek is a recipient of an INFORMS MSOM Student Paper Prize (2006), an INFORMS JFIG paper prize (2009, 2011), the NSF CAREER award (2011), MIT Sloan's Outstanding Teacher award (2013), the INFORMS Simulation Society Best Publication Award (2014), the INFORMS Pricing and Revenue Management Best Publication Award (2015), the INFORMS MSOM Best Publication award in Management Science (2016), the MSOM Young Scholar Prize and the MIT-wide Jamieson Prize for Excellence in Teaching (2020). His practice based work has won the Wagner prize (2022) and has been judged a finalist for the Pierskalla Award (2011), the Gary L. Lilien ISMS-MSI Marketing Practice Prize (2016) and the Wagner Prize (2018). Vivek's doctoral advisees have on various occasions won the Nicholson, MSOM, APS and RMP student paper prizes. Outside of academia, Vivek was co-founder/CTO at Celect (2014-19; acquired by Nike) and was a corresponding author of the Nature Communications paper underlying the technology at Seer (IPO in 2020). He serves on several technology startup advisory boards and has worked in various capacities in quantitive finance and private equity.
TIME Tuesday April 23, 2024 at 11:00 AM - 12:00 PM
LOCATION ITW 1.350, Ford Motor Company Engineering Design Center map it
ADD TO CALENDAR&group= echo $value['group_name']; ?>&location= echo htmlentities($value['location']); ?>&pipurl= echo $value['ppurl']; ?>" class="button_outlook_export">
CONTACT Kendall Minta kendall.minta@gmail.com
CALENDAR Department of Industrial Engineering and Management Sciences (IEMS)