Lecture Notes

LEC # TOPICS LECTURE NOTES
1 Markov Decision Processes

Finite-Horizon Problems: Backwards Induction

Discounted-Cost Problems: Cost-to-Go Function, Bellman's Equation
(PDF)
2 Value Iteration

Existence and Uniqueness of Bellman's Equation Solution

Gauss-Seidel Value Iteration
(PDF)
3 Optimality of Policies derived from the Cost-to-Go Function

Policy Iteration

Asynchronous Policy Iteration
(PDF)
4 Average-Cost Problems

Relationship with Discounted-Cost Problems

Bellman's Equation

Blackwell Optimality
(PDF)
5 Average-Cost Problems

Computational Methods
(PDF)
6 Application of Value Iteration to Optimization of Multiclass Queueing Networks

Introduction to Simulation-based Methods Real-Time Value Iteration
(PDF)
7 Q-Learning

Stochastic Approximations
(PDF)
8 Stochastic Approximations: Lyapunov Function Analysis

The ODE Method

Convergence of Q-Learning
(PDF)
9 Exploration versus Exploitation: The Complexity of Reinforcement Learning (PDF)
10 Introduction to Value Function Approximation

Curse of Dimensionality

Approximation Architectures
(PDF)
11 Model Selection and Complexity (PDF)
12 Introduction to Value Function Approximation Algorithms

Performance Bounds
(PDF)
13 Temporal-Difference Learning with Value Function Approximation (PDF)
14 Temporal-Difference Learning with Value Function Approximation (cont.) (PDF)
15 Temporal-Difference Learning with Value Function Approximation (cont.)

Optimal Stopping Problems

General Control Problems
(PDF)
16 Approximate Linear Programming (PDF)
17 Approximate Linear Programming (cont.) (PDF)
18 Efficient Solutions for Approximate Linear Programming (PDF)
19 Efficient Solutions for Approximate Linear Programming: Factored MDPs (PDF)
20 Policy Search Methods (PDF)
21 Policy Search Methods (cont.) (PDF)
22 Policy Search Methods for POMDPs

Application: Call Admission Control

Actor-Critic Methods
23 Approximate POMDP Compression
24 Policy Search Methods: PEGASUS

Application: Helicopter Control