EECS 395, 495: Deep Reinforcement Learning from Scratch

Quarter Offered

Spring : W 5-8 ; Borhani, Watt


Prior deep learning experience (e.g. EECS 395/495 Deep Learning Foundations from Scratch ) and strong familiarity with the Python programming language. Python will be used for all coding assignments. No other language can be used to complete programming assignments.


In recent years major improvements to deep networks, massive increases in compute power, and ready access to data and simulation tools have helped make Deep Reinforcement Learning one of the most powerful tools for dealing with control-driven dynamic systems today.  From the design of automatic control functionality for robotics and self-driving vehicles to the development of sophisticated game AI,reinforcement has been used to develop a variety of bleeding edge technologies of both practical and theoretical interest.  In this course we introduce the fundamentals of Deep Reinforcement Learning from scratch starting from its roots in Dynamic Programming and optimal control, and ending with some of the most popular applications in practice today.  Through exercises and a final course project students will gain significant hands-on experience coding up and testing reinforcement systems on a variety of interesting problems taken from optimal control and video game AI.

INSTRUCTORS: Dr. Reza Borhani and Dr. Jeremy Watt


  1. Optimal Control and Dynamic Programming
  • Dynamic systems, optimal control,  and Markov Decision Processes
  • Modeling, simulation, and system identification
  • Formal solution methods and limitations
  • Dynamic programming
  • Limitations of model-based control
  1. Introduction to Reinforcement Learning
  • The basic Q-Learning algorithm
  • Exploration-exploitation trade-off, short-term long-term reward
  • Generalizability of reinforcement systems
  • Challenges in scaling Q-Learning to large state spaces
  1. Deep Feedforward Networks
  • Linear and nonlinear supervised learning recap
  • Simple recipes for building deep feedforward networks 
  • First order optimization 
  • Useful optimization tricks for deep networks
  • Convolutional networks
  1. Reinforcement Learning in large state spaces
  • Extending Q-Learning to deal with large state spaces
  • The limits of discretization
  • Q-Learning with function approximators
  • Deep Q-Learning
  • Performance engineering
  • Memory replay and optimization
  1. Policy gradient methods
  • The Q function, basic geometry, and multiclass classification
  • The policy gradient algorithm
  • The probabilistic perspective
  • Performance engineering
  1. Reinforcement in continuous action space
  • Actor-Critic methods
  • Approximate Q-Learning and the wire-fitting algorithm
  • Other popular extensions of Q-Learning

COURSE HAND-OUTS: This course has no required textbook.  Handouts authored by the instructors will be made freely available to students for notes.  In addition a small number of relevant sections from the author’s textbook Machine Learning Refined (Cambridge University Press) will be made freely available to students.

PROBLEM SETS: 4-5 problem sets will be assigned and graded.  

COURSE PROJECT: One course project involving major themes from the course will be required of each student.

COURSE GRADE: Final grades for the course are based on homework (75%) and final exam (25%)