Liu, D., Xue, S., Zhao, B., Luo, B., & Wei, Q. (2020). Adaptive dynamic programming for control: A survey and recent advances. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 51(1), 142-160.
A. ADP for Optimal State Regulation
The optimal state regulator is to keep the state near the equilibrium and to maximize the value function of the system.
Nonlinear dynamic systems can generally be divided into affine and
nonaffine systems, which are described by
Problem 1: For affine/nonaffine systems and finite/infinite-time value functions, the problem of optimal state regulation is to design a learning control structure, and then gradually explores the optimal control function which minimizes the value function and stabilizes the closed-loop systems.
The Hamiltonian of the systems (1) is designed as
Algorithm 1: PI for the system
Step 1 Initialization: . Select an initial admissible control policy . Step 2 Evaluation:
​
. ​ The value function
under the control is obtained according to Step 3 Improvement: ​ The updated control policy
is obtained according to ​ More specifically Step 4 Judgment: ​ If preset conditions for convergence are not met, go back to Step 2.
Step 5 Stop:
​ Obtain the optimal control policy
and the optimal value function .
Integral Reinforcement Learning: Traditional policy evaluation requires the knowledge of system dynamics. In the case of unknown internal dynamics, many algorithms cannot be used directly. To deal with this problem, IRL was developed.
For time
It is found that the mathematical system model is not explicitly included, but is actually implicit in the data measurement.
Function Approximation Based on NN: NNs have been used for function approximation in the implementation of ADP algorithms.
Assumption: The continuous function
Employing a critic NN and an actor NN to approximate the optimal
value function and the optimal control, we have
The residual error
B. ADP for Optimal Output Regulation
Problem 2 : The optimal output regulation is to design control input such that the output approaches zero when minimizing the value function.
Optimal Output Regulation for Linear Systems:
Take a linear continuous-time system as an example
Algorithm 2 LQR Based on IRL for Linear System
Step 1 Initialization:
, ​
=0 ​ Select an initial stabilizing gain
. Step 2 Evaluation:
​
. ​
is obtained according to Step 3 Improvement: ​ The updated control policy is obtained according to
Step 4 Judgment: ​ If preset conditions for convergence are not met, go back to Step 2.
Step 5 Stop:
​ Obtain the optimal control policy
and the optimal value function .
The method was applied to both linear quadratic regulator (LQR) and
linear quadratic tracking (LQT) by using a discounted value function as
The optimal tracking problem has attracted more and more attention in the control field. By constructing the augmented system with tracking error and desired trajectory, the solution of the optimal tracking control problem is transformed into an optimal regulation problem.
Problem 3: The optimal tracking control problem is to design a control policy to make the actual output of the system track the desired trajectory and minimize the preset value function.
Consider the general value function: