Here, we summarized 3 different inverse matrix update methods for
kernel matrixes. The methods avoid direct inversion when some new
datasets are added. The three methods can be used in different
situations.
Symbols
Meaning
a scalar
a scalar at index
a subset of dataset between index and index , including and
a column vector
a row vector
a matrix
the th column of matrix
the th row of matrix
1. Kernel Matrix
Kernel methods are widely used in machine learning field (Support
Vector Machine, Gaussian Process). The kernel matrix is generally formed by a
dataset . The entries of is determined by kernel
function .
Typical kernel functions are, for example, Squared Exponential kernel
function: The Matern class of kernel function: Then form of kernel matrix is like: Usually, by properly choosing kernel function, the kernel
matrix is positive definite and symmetrical.
2. Matrix Inversion
In some cases, the inversion of kernel matrix is needed. The most
common method is using the Cholesky decomposition to do the inversion.
Every symmetric, positive definite matrix can be decomposed into a
product of a unique lower triangular matrix and its transpose as as
the entries of can be
determined iteratively from as The inversion of matrix can be calculated as If we denote , we have so by expanding the equation at left for each row of , we get expressions like:
As are known, in
the first set, we can calculate the value of the first row of , and this result can be used
to calculate the second set as the second row . Then we can calculate the value
of iteratively by this
method. Finally, as defined , then we can calculate .
3. Augmented Inverse Update
When some samples have to be added to dataset or deleted from , the corresponding kernel matrix and
its inverse should be recalculated. For kernel matrix, we only have to
change the value of one row and one column, so this update is efficient.
However when an element of is
changed , the whole inverse matrix is completely changed. It
can be proven that the direct inversion in last chapter requires complexity, which can be
unacceptable in many practical cases. So, alternative calculation
without direct inversion should be invented with less complexity. In
some special cases, the update with less complexity of inverse kernel
matrix exists.
The First Method is called Augmented Update. It corresponds to that
we already have known a dataset
and the kernel matrix :
Also, the
has been calculated by the matrix inversion method in Matrix
Inversion. When a new sample is added to the end of the dataset
as , and
we want to calculate the new kernel matrix.
We have the following definitions:
The known matrix formed by of samples as , and its inverse ;
The matrix formed by of
samples as , and
its inverse ;
can be
calculated with adding a new row and column at the end of as This operation is very efficient, however if we want to
calculate , it
can be inefficient as n is large.
We can express the relation in a compact way is sub-matrix
of size calculated as the
kernel function values between each in original dataset and the
new sample . is an element of kernel function value
of and itself.
It can be proven that this update only takes complexity. This method
will not loss any accuracy while being efficient, because the direct
inversion causes a lot redundant calculation.
4. Sliding Window Update
This section is when a new sample is added at the end of , however we want to keep the
size of the dataset unchanged, so we eliminate the first
element in this dataset. This operation is like a FIFO system. As the kernel matrixes of these two datasets are both of size
, in this case, we use
, to denote the
kernel matrix and its inverse before update, and , for the updated
dataset. Then we have the definition: is the first element of
, is the sub-matrix at the first
column of size . is the sub-matrix at the right
bottom corner of size .
Also we have known the inverse of old kernel matrix as with a similar
segmentation: The kernel matrix of the updated dataset is Then, with the help of , and , we can calculate as MATLAB code:
Knew_inv = [B + (B*h)* (B*h)'*s, -(B*h)*s; -(B*h)'*s, s]; end
It can be proven that this update only takes complexity.
5. Sherman-Morrison Update
The situation is more general, where we have the original dataset
, one of its element is replaced
by a new sample: When the element at index is replaced, only the th row and th column are changed.
The updated kernel matrix can be expressed in such form where is a vector of size , whose elements are the
kernel function value between in range to and new sample . Similar explanations
with the other vectors.
As we have known which sample of will be replaced by , the is a determined
value.
Then we define some temporary variables Then the will be calculated
as MATLAB code: