- Optimize for the curious. For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
- Practicality first. Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem.
- Git-course. Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! Pull-request it!
Chat room for YSDA & HSE students is here
Grading rules for YSDA & HSE students is here
Anonymous feedback form.
Virtual course environment:
The syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks.
- Lecture: RL problems around us. Decision processes. Stochastic optimization, Crossentropy method. Parameter space search vs action space search.
- Seminar: Welcome into openai gym. Tabular CEM for Taxi-v0, deep CEM for box2d environments.
- Homework description - see week1/README.md.
week02_value_based Value-based methods
- Lecture: Discounted reward MDP. Value-based approach. Value iteration. Policy iteration. Discounted reward fails.
- Seminar: Value iteration.
- Homework description - see week2/README.md.
week03_model_free Model-free reinforcement learning
- Lecture: Q-learning. SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. TD(Lambda).
- Seminar: Qlearning Vs SARSA Vs Expected Value SARSA
- Homework description - see week3/README.md.
recap_deep_learning - deep learning recap
- Lecture: Deep learning 101
- Seminar: Intro to pytorch/tensorflow, simple image classification with convnets
week04_approx_rl Approximate (deep) RL
- Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick; experience replay, target networks, double/dueling/bootstrap DQN, etc.
- Seminar: Approximate Q-learning with experience replay. (CartPole, Atari)
- Lecture: Contextual bandits. Thompson Sampling, UCB, bayesian UCB. Exploration in model-based RL, MCTS. "Deep" heuristics for exploration.
- Seminar: bayesian exploration for contextual bandits. UCB for MCTS.
week06_policy_based Policy Gradient methods
- Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance reduction(baseline), advantage actor-critic (incl. GAE)
- Seminar: REINFORCE, advantage actor-critic
week07_seq2seq Reinforcement Learning for Sequence Models
- Lecture: Problems with sequential data. Recurrent neural netowks. Backprop through time. Vanishing & exploding gradients. LSTM, GRU. Gradient clipping
- Seminar: character-level RNN language model
week08_pomdp Partially Observed MDP
- Lecture: POMDP intro. POMDP learning (agents with memory). POMDP planning (POMCP, etc)
- Seminar: Deep kung-fu & doom with recurrent A3C and DRQN
week09_policy_II Advanced policy-based methods
- Lecture: Trust region policy optimization. NPO/PPO. Deterministic policy gradient. DDPG
- Seminar: Approximate TRPO for simple robot control.
week10_planning Model-based RL & Co
- Lecture: Model-Based RL, Planning in General, Imitation Learning and Inverse Reinforcement Learning
- Seminar: MCTS for toy tasks
yet_another_week Inverse RL and Imitation Learning
- All that cool RL stuff that you won't learn from this course :)
Course materials and teaching by: [unordered]
- Pavel Shvechikov - lectures, seminars, hw checkups, reading group
- Nikita Putintsev - seminars, hw checkups, organizing our hot mess
- Alexander Fritsler - lectures, seminars, hw checkups
- Oleg Vasilev - seminars, hw checkups, technical support
- Dmitry Nikulin - tons of fixes, far and wide
- Mikhail Konobeev - seminars, hw checkups
- Ivan Kharitonov - seminars, hw checkups
- Ravil Khisamov - seminars, hw checkups
- Anna Klepova - hw checkups
- Fedor Ratnikov - admin stuff