is the joint distribution of the two random
variables . The distributions
and are the corresponding
marginal distributions, and is the conditional distribution of given .
The sum rule (marginalization property): The product rule relates the joint distribution to the
conditional distribution via:
In machine learning and Bayesian statistics, we are often interested
in making inferences of unobserved (latent) random variables given that
we have observed other random variables. Let us assume we have some
prior knowledge
about an unobserved random variable and some relationship
between and a second
random variable ,
which we can observe. If we observe , we can use Bayes'theorem
to draw some conclusions about given the observed values
of . Bayes' theorem
(also Bayes' theorem Bayes' rule or Bayes' law) According to product rule, we have so that The quantity is the marginal likelihood/evidence.
Therefore, the marginal likelihood is independent of , and it ensures that the
posterior is normalized.
Means and Covariances
(Expected Value). The expected value of a function of a
univariate continuous random variable is given by Correspondingly, the expected value of a function of a discrete random variable is given by (Mean). The mean of a random variable with states is an
average and is defined as where for ,
where the subscript indicates the
corresponding dimension of . The
integral and sum are over the states of the target space of the random variable .
(Covariance (Univariate)). The covariance between two univariate
random variables
is given by the expected product of their deviations from their
respective means, i.e., By using the linearity of expectations, the expression can be
rewritten as the expected value of the product minus the product of the
expected values, i.e., The covariance of a variable with itself is called the
variance and variance is denoted by . The square root of the
variance is called the standard deviation and is often
denoted by .
(Covariance (Multivariate)). If we consider two multivariate random
variables and with states and respectively, the
covariance between and is defined as (Variance). The variance of a random variable with variance states and a mean vector
is defined as
The normalized version of covariance is called the
correlation.
(Correlation). The correlation between two random variables is given by
Empirical Means and
Covariances
(Empirical Mean and Covariance). The empirical mean vector is the
arithmetic average of the observations for each variable, and it is
defined as
The empirical covariance matrix is a D×D matrix
Other Expressions for the
Variance
We now focus on ==a single random== variable . The standard definition of variance is
the expectation of the squared deviation of a random variable from its expected value , i.e., The above formula can be converted to the so-called raw-score
formula for variance:
Sums and
Transformations of Random Variables
Consider two random variables with states . Then: Consider a random variable with mean and covariance matrix and a (deterministic) affine
transformation of . Then is itself a random variable whose mean
vector and covariance matrix are given by Furthermore,
Statistical Independence
(Independence). Two random variables X, Y are statistically
independent if and only if
If are (statistically)
independent, then Another concept that is important in machine learning is
conditional independence.
(Conditional Independence). Two random variables and are conditionally independent given
if and only if for all , where is the set of states of
random variable . We write to denote that is conditionally independent of given .
By using the product rule of probability , we have
Then, This alternative presentation provides the interpretation
“given that we know z, knowledge about y does not change our knowledge
of x”.
Gaussian Distribution
For a univariate random variable, the Gaussian
distribution has a density that is given by The multivariate Gaussian distribution is
fully characterized by a mean vector vector and a covariance matrix
and defined as We write or .
Marginals
and Conditionals of Gaussians are Gaussians
To consider the effect of applying the sum rule of probability and
the effect of conditioning, we explicitly write the Gaussian
distribution in terms of the concatenated states , The conditional distribution is also Gaussian
and given by the y-value is an observation and no longer random.
The marginal distribution of a joint Gaussian distribution is itself
Gaussian and computed by applying the sum rule and given by
Product of Gaussian
Densities
The product of two Gaussians is a Gaussian distribution scaled
by a , given by
with $$
$$ The scaling constant itself
can be written in the form of a Gaussian density either in or in with an “inflated” covariance matrix
, i.e.,
Sums and Linear
Transformations
If , are independent Gaussian random
variables (i.e., the joint distribution is given as with and then is also
Gaussian distributed and given by
Consider a mixture of two univariate Gaussian densities and are univariate Gaussian densities.
Then the mean of the mixture density is given by the weighted sum of the
means of each random variable: The variance of the mixture density is given by Consider a Gaussian distributed random variable . For a given matrix of appropriate shape, let
be a random variable such that
is a transformed version of . Then Let us now consider the reverse transformation: when we know
that a random variable has a mean that is a linear transformation of
another random variable. Let be a Gaussian random variable with mean , i.e., Then,
Hence, is a linear
transformation of ,
and we obtain