Robust tracking control of uncertain nonlinear systems with adaptive dynamic programming
Zhao Zhao

Zhao, J., Na, J., & Gao, G. (2022). Robust tracking control of uncertain nonlinear systems with adaptive dynamic programming. Neurocomputing, 471, 21-30.

To address the robust control problem, an uncertain system is given as Assume the uncertainties are bounded, The desired trajectories to be tracked are produced by a command generator given as follows: The purpose of this paper is to design a robust control such that the system state can track and the tracking error can be minimized.

The augmented system is defined as Define, ,, and then the augmented system can be given as Clearly, we have the upper bound of the uncertain term as Now, we consider the nominal part of the augmented system as To formulate the optimal control of this nominal system, we seek a control to minimize the following cost function with a discounted factor denotes the discounted factor, which is used to guarantee the boundedness of the cost function even for tracking a nonvanishing trajectory.

Hence, the remaining problem to be addressed is to solve the above optimal control problem in an online manner. To accomplish the optimal control design, we take the derivative of (8) along the augmented system state , such that Hence, based on the optimal control principle, the Hamiltonian of nominal system with cost function can be given as where is the derivative of V with respect to the augmented system state .

We define the optimal cost function as Then the corresponding HJB equation can be given as Hence, the equation can be solved from (10) to obtain the optimal control action as When and u satisfy the HJB, i.e., and , then we have Then the HJB equation can be rewritten as which implies By multiplying on both sides of the above equation, we have One can show that when , the optimal tracking control makes the tracking error asymptotically stable. Specifically, the desired trajectories are not necessarily asymptotically stable, hence the discount factor has to be used to guarantee that is bounded.

In this case, we do not require that the trajectory to be tracked is vanishing.

Critic NN and online adaptive learning

The key idea of the ADP method is to apply a critic NN to estimate the optimal cost function . To this end, can be considered as a continuous function, which can be estimated by a critic NN as Therefore, we have its derivative with respect to as Hence, we have the following Assumption as In practice, the ideal NN weights are unknown, thus we have the approximated cost function as Hence, the estimated solution of HJB We have the ideal optimal control and the actual optimal control as Then we can rewrite the HJB equation as For developing an adaptive law to estimate the critic NN weights , the known terms in can be defined as Then, the HJB can be given as To this end, the filtered regressor matrices , and can be denoted as Hence, its solution can be derived as which can be online calculated based on the augmented system state .

An auxiliary vector can be defined as Then, we have Then We can design the following adaptive law to online calculate as It should be noted that the proposed control scheme can be implemented without any offline learning process. To illustrate the implementation of the proposed control and learning algorithm, the following Algorithm 1 is given:

Algorithm 1 (Adaptive Optimal Control Implementation for Solving Robust Tracking Control)

  1. (Initialization): Define the initial weights and gains ;

  2. Measurement): Measure the system state , and construct the regressors

  3. (Online learning): Calculate and update the estimated weights to get control ;

  4. (Control): Apply the derived control on the system

The closed-loop system can be given as To carry out the stability analysis, the following Assumption will be given The Lyapunov function is set as The first term: The second term:

Then, here is removed

We also know Also The third term:

Then, we have Then, we should chose: Then, we have Specifically, the constant is a residual term determined by the critic NN error , hence we can conclude that for .