Inverse Matrix Update for Kernel Matrix

Here, we summarized 3 different inverse matrix update methods for kernel matrixes. The methods avoid direct inversion when some new datasets are added. The three methods can be used in different situations.

Symbols	Meaning
	a scalar
	a scalar at index
	a subset of dataset between index and index , including and
	a column vector
	a row vector
	a matrix
	the th column of matrix
	the th row of matrix

1. Kernel Matrix

Kernel methods are widely used in machine learning field (Support Vector Machine, Gaussian Process). The kernel matrix is generally formed by a dataset . The entries of is determined by kernel function .

Typical kernel functions are, for example, Squared Exponential kernel function: The Matern class of kernel function: Then form of kernel matrix is like: Usually, by properly choosing kernel function, the kernel matrix is positive definite and symmetrical.

2. Matrix Inversion

In some cases, the inversion of kernel matrix is needed. The most common method is using the Cholesky decomposition to do the inversion. Every symmetric, positive definite matrix can be decomposed into a product of a unique lower triangular matrix and its transpose as as

the entries of can be determined iteratively from as The inversion of matrix can be calculated as If we denote , we have so by expanding the equation at left for each row of , we get expressions like: As are known, in the first set, we can calculate the value of the first row of , and this result can be used to calculate the second set as the second row . Then we can calculate the value of iteratively by this method. Finally, as defined , then we can calculate .

3. Augmented Inverse Update

When some samples have to be added to dataset or deleted from , the corresponding kernel matrix and its inverse should be recalculated. For kernel matrix, we only have to change the value of one row and one column, so this update is efficient. However when an element of is changed , the whole inverse matrix is completely changed. It can be proven that the direct inversion in last chapter requires complexity, which can be unacceptable in many practical cases. So, alternative calculation without direct inversion should be invented with less complexity. In some special cases, the update with less complexity of inverse kernel matrix exists.

The First Method is called Augmented Update. It corresponds to that we already have known a dataset and the kernel matrix : Also, the has been calculated by the matrix inversion method in Matrix Inversion. When a new sample is added to the end of the dataset as , and we want to calculate the new kernel matrix.

We have the following definitions:

The known matrix formed by of samples as , and its inverse ;
The matrix formed by of samples as , and its inverse ;

can be calculated with adding a new row and column at the end of as This operation is very efficient, however if we want to calculate , it can be inefficient as n is large.

We can express the relation in a compact way is sub-matrix of size calculated as the kernel function values between each in original dataset and the new sample . is an element of kernel function value of and itself.

The solution of updated is MATLAB code:

function [Knew_inv, Knew] = AugmUpdate(Kold, Kold_inv, Xtr, xs, KernelParam, Sigma_n)
% KernelParam---Kernel.SigmaL  length scale
%                         Kernel.SigmaF   noise scale std
%                         SigmaN   noise std

   [k_xs, k_ss] = KernelVector(Xtr, xs, KernelParam, Sigma_n) ;
   b = k_xs ;
   d = k_ss ;
   Knew = [Kold, b ; b' , d] ;
   
   e = Kold_inv * b ;
   g = 1/(d-b'*e) ;

   Knew_inv = [Kold_inv + g*e*e', -g*e; -g * e', g] ;   
end

It can be proven that this update only takes complexity. This method will not loss any accuracy while being efficient, because the direct inversion causes a lot redundant calculation.

4. Sliding Window Update

This section is when a new sample is added at the end of , however we want to keep the size of the dataset unchanged, so we eliminate the first element in this dataset. This operation is like a FIFO system. As the kernel matrixes of these two datasets are both of size , in this case, we use , to denote the kernel matrix and its inverse before update, and , for the updated dataset. Then we have the definition: is the first element of , is the sub-matrix at the first column of size . is the sub-matrix at the right bottom corner of size .

Also we have known the inverse of old kernel matrix as with a similar segmentation: The kernel matrix of the updated dataset is Then, with the help of , and , we can calculate as MATLAB code:

function [Knew_inv, Knew] = SlidingWindowUpdate(Kold, Kold_inv, Xtr, xs, KernelParam, Sigma_n)
% Kold-----the old kernel matrix
% Kold_inv---the old kernel inversion matrix
% Xtr----the BV data
% xs ---new coming data
% KernelParam, Sigma_n, 
% alway replace the first data 

% a = Kold(1, 1) ;
% b = Kold(2:end, 1) ;
D = Kold(2:end, 2:end);

e = Kold_inv(1, 1) ;
f = Kold_inv(2:end, 1) ;
H = Kold_inv(2:end, 2:end);

[k_xs, k_ss] = KernelVector(Xtr(2:end, : ), xs, KernelParam, Sigma_n) ;

h = k_xs;
g = k_ss;

Knew = [D, h; h', g] ;

B = H- f* f'/e ;
s = 1/(g-h'*B*h) ;

Knew_inv = [B + (B*h)* (B*h)'*s, -(B*h)*s;   -(B*h)'*s, s];
end

It can be proven that this update only takes complexity.

5. Sherman-Morrison Update

The situation is more general, where we have the original dataset , one of its element is replaced by a new sample: When the element at index is replaced, only the th row and th column are changed.

The updated kernel matrix can be expressed in such form where is a vector of size , whose elements are the kernel function value between in range to and new sample . Similar explanations with the other vectors.

As we have known which sample of will be replaced by , the is a determined value.

Then we define some temporary variables Then the will be calculated as MATLAB code:

function [Knew_inv, Knew] = ShermanMorrisonUpdate(Kold, Kold_inv, Xtr, xs, KernelParam, Sigma_n, j)
% Kold-----the old kernel matrix
% Kold_inv---the old kernel inversion matrix
% Xtr----the BV data
% xs ---new coming data
% KernelParam, Sigma_n, 
% j ----the replace index at j

[k_xs, k_ss] = KernelVector(Xtr(1: j-1, : ), xs, KernelParam, Sigma_n) ;

K_jminus1 = k_xs; 
k_plus1 = k_ss ;
clear k_xs k_ss

[k_xs, ~] = KernelVector(Xtr(j+1: end, : ), xs, KernelParam, Sigma_n) ;
K_jplus1=k_xs ;
clear k_xs

Knew = [Kold(1:j-1, 1:j-1),  K_jminus1, Kold(1:j-1, j+1:end) ; ...
     K_jminus1', k_plus1 , K_jplus1' ; ...
     Kold(j+1:end, 1:j-1),K_jplus1,  Kold(j+1:end, j+1:end)];

 r = Knew(:, j) - Kold(:, j) ;
As = Kold_inv - Kold_inv*r*Kold_inv(j, :) /(1+ r' * Kold_inv( :, j)) ;

Knew_inv = As- As(:, j)* r'*As /(1+r' * As( :, j)) ;

end