The multinomial distribution is a discrete distribution which gives
the probability of choosing a given collection of items from a set
of
items with repetitions and the probabilities of each choice
given by
. These probabilities are the parameters of
the multinomial distribution [16].
The Dirichlet distribution is the conjugate prior of the parameters of
the multinomial distribution. The probability density of the
Dirichlet distribution for variables
with parameters
is defined by
Let
.
The mean and variance of the distribution are [16]
When
, the distribution becomes noninformative.
The means of all the
stay the same if all
are scaled with
the same multiplicative constant. The variances will, however, get
smaller as the parameters
grow. The pdfs of the Dirichlet
distribution with certain parameter values are shown in
Figure A.2.
![]() |
In addition to the standard statistics given above, using ensemble
learning for parameters with Dirichlet distribution requires the
evaluation of the expectation
and the negative
differential entropy
.
The first expectation can be reduced to evaluating the expectation over a two dimensional Dirichlet distribution for
By using this result, the negative differential entropy can be evaluated