EM#
EM algorithm applies to a large faimily of estimation problems with latent variables, e.g GMM.
Suppose we have a training set \(\{x^{(1)},...,x^{(n)}\}\) with \(z\) being the latent variable, by marginal probabilities:
We wish to fit the parameters \(\theta\) by maximizing the log-likelihood of the data:
Maximizing \(l(\theta)\) directly might be difficult.
Our strategy will be to instead repeatedly construct a lower-bound on \(l\) (E-step), and then optimize that lower-bound (M-step).
Lower Bound#
Let \(Q\) be a distribution over \(z\), then:
We call this bound the evidence lower bound(ELBO) and denote it by:
To hold with equality, it is sufficient that:
which is equivalent to:
The EM Algorithm#
Taking all instances into account, for any distributions \(Q_{1},...,Q_{n}\):
equality holds when \(Q_{i}\) equal to the posterior distribution in this setting of \(\theta\):
M-step maximize the lower bound with respect to \(\theta\) while keeping \(Q_{i}\) fixed.
EM algorithm ensures \(l(\theta^{(t)}) \le l(\theta^{(t+1)})\), thus ensures convergence.