Adaptive dynamic programming-based stabilization of nonlinear systems with unknown actuator saturation
Zhao Zhao

Zhao, B., Jia, L., Xia, H., & Li, Y. (2018). Adaptive dynamic programming-based stabilization of nonlinear systems with unknown actuator saturation. Nonlinear Dynamics, 93(4), 2089-2103.

The considered nominal continuous-time nonlinear systems can be described as

Online nominal optimal control

The objective of this optimal control problem is to find the stabilizing nominal control to minimize the infinite-horizon cost function which is given by are positive definite matrices. If the associated infinite-horizon cost function (2) is continuously differentiable, the infinitesimal version of (2) is the so-called nonlinear Lyapunov equation
In light of the nominal control policy and the cost function , define the Hamiltonian as and the optimal cost function as The optimal cost function of (5) can be derived from the solution of the HJB equation The item indicates the partial gradient of the cost function in (5) with respect to , If the solution of (6) exists, the closed-loop description for optimal control can be obtained as By simple transformation, we get In the following, we approximate the value function with a critic NN with a single hidden layer as Then, the partial gradient of with respect to is where Thus, the Hamiltonian can be described as Then, we have In virtue of the ideal weight vector is unavailable, the approximate critic NN can be expressed by Then, we have the partial gradient of with respect to as Thus, the Hamiltonian can be approximated as Define , we have For adjusting the critic NN weight vector , the steepest descent algorithm is used to minimize the objective function . Thus, the weight vector approximation error can be updated adaptively by Thus, can be updated by (how to calculate ?) Therefore, the ideal nominal control policy can be expressed as Thus, it can be approximated as Choose the Lyapunov function candidate as The time derivative of is Key assumptions: Hence, as long as lies outside of the following compact set Therefore, according to Lyapunov's direct method, the approximation error of the weight vector is UUB.

Neural network-based unknown saturation compensation

In order to tackle the unknown actuator saturation, the vector which is the so-called saturation nonlinearity, is introduced with the definition as

Noticing that in the case of no actuator saturation, remains zero, and the control law becomes the same as the ideal nominal control law. However, is nonzero in the presence of actuator saturation. Thus, the saturated nonlinear system can be transformed into Here, a backpropagation NN is introduced to approximate the unknown item and it can be presented as Then, we have It can be updated by Define as the overall NN approximation error. We have

The overall control law for nonlinear system is designed as

Algorithm 1 Feed-forward compensation-based online optimal stabilization algorithm

1: Select a set of small positive constants , the maximum iteration time , the maximum run step , the initial values and of corresponding NN weight vectors. Let and , and begin with a given nominal control policy .

2: (Policy evaluation) Let , solve the following nonlinear Lyapunov equation for the control policy : 3: (Policy improvement) Update the control policy by 4: If , stop and obtain the approximated optimal control; else, let , if , return to Step 2, otherwise go to Step 5.

5: (Feed-forward compensation) Update the weight vector of NN by And obtain the approximate unknown term as 6: (Overall control policy) Update the overall control policy by 7: If , return to Step 2; else, stop running.

Stability analysis

Assumption: Choose the Lyapunov function candidate as The time derivative is For the second item, we have With , we have Then, with the fact: we have: Since is locally Lipschitz, there exists a positive constant which satisfies . Suppose that . Then ,we have Supposing that , we have we can conclude that when the state lies outside of the compact set with the following conditions: It implies that all the signals of the closed-loop nonlinear system with unknown actuator saturation can be guaranteed to be UUB.