Barto-Sutton Chap.10 On-policy Control with approximation
Predict problem: Given a policy, calculate the state value function.
Tabular solution: \(s_t \rightarrow V(s_t)\)
Approximate solution: \(s_t \rightarrow \hat V(s_t) \approx V(s_t)\)
Samples formation: \(S_t \rightarrow U_t\)
- Monte Carlo: \(U_t = G_t\)
- TD(0): \(U_t = r_t + \gamma \hat V(s_{t+1})\)
- TD(\(\lambda\)): \(U_t = G_{t:t+\lambda}\)