As an example of Gaussian parameters we shall consider and . All the others are handled in essentially the same way except that there are no weights needed for different states.
To simplify the notation, all the indices from and are dropped out for the remainder of this section. The relevant terms of the cost function are now, up to an additive constant
Let us denote , and .
The derivative of this expression with respect to is easy to evaluate
Setting this to zero gives
The derivative with respect to is
The solutions for parameters of are exact. The true posterior for these parameters is also Gaussian so the approximation is equal to it. This is not the case for the parameters of . The true posterior for is not Gaussian. The best Gaussian approximation with respect to the chosen criterion can still be found by solving the zero of the derivative of the cost function with respect to the parameters of . This is done using Newton's iteration.
The derivatives with respect to and are