The term is just a sum over the discrete distribution. It can be further simplified into
The other term, can be split down to
The above equations give the value of the cost function for given approximating distribution . This value is important because it can be used to compare different models as shown in Section 3.3. Additionally it can be used to monitor whether the iterative optimisation procedure has converged.