Bayesian inference and likelihood

<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/bf1e9653-2854-4a74-9b6e-f415615c68ce/bayess_theorem.png" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/bf1e9653-2854-4a74-9b6e-f415615c68ce/bayess_theorem.png" width="40px" /> Bayes’ theorem:

$$ \begin{aligned} P(a \mid x) &= \frac{I(a) \mathcal{L}(x \mid a)}{E(x)} \\ &= \frac{I(a) \prod \mathcal{L}(x_i \mid a)}{E(x)} \end{aligned} $$

with:

$x=\{x_1,x_2,\cdots,x_N\}$: the data
$a$: the parameter (or model)
$P(A\,|\,x)$: the posterior probability of $a$ conditional to $x$
$I(a)$: the prior probability of $a$
$\mathcal L(x\,|\,a)$: the likelihood probability of $x$ conditional to $a$
$E(x)$: the evidence of $x$ </aside>

$$ \begin{aligned} E(x) &= \sum_i I(a_i) \mathcal{L}(x \mid a_i) \\ &= \int I(a) L(x \mid a) \, {\rm d}a \end{aligned} $$

The first is discrete, second is continuous

Maximum a posteriori and maximum likelihood

Maximum a posteriori (MAP):

$$ \begin{aligned} \widehat{a}{\rm MAP} &= \arg \max{\substack a} P(a \mid x) \equiv \left[ \frac{\partial P(a \mid x)}{\partial a} \right]{a = \widehat{a}} = 0\\ &= \arg \max{\substack a} \ln P(a \mid x) \equiv \left[ \frac{\partial \ln P(a \mid x)}{\partial a} \right]_{a = \widehat{a}} = 0 \end{aligned} $$

Maximum likelihood estimation (MLE)

$$ \begin{aligned} \widehat{a}{\rm MLE} &= \arg \max{\substack a} \mathcal{L}(a \mid x) \equiv \left[ \frac{\partial \mathcal{L}(a \mid x)}{\partial a} \right]{a = \widehat{a}} = 0\\ &= \arg \max{\substack a} \ln \mathcal{L}(a \mid x) \equiv \left[ \frac{\partial \ln \mathcal{L}(a \mid x)}{\partial a} \right]_{a = \widehat{a}} = 0 \,. \end{aligned} $$

Maximum likelihood as an estimator

The MLE can be thought of as a good estimator in most cases:

Generally (though not strictly always) consistent

Generally biased, but this disappears in the limit of large $N$ for any consistent estimator

Generally invariant

Generally efficient

Maximum likelihood invariance

$$ \widehat{f(a)}=f(\widehat a) $$

Variance on maximum likelihood estimators

The minimum variance bound (MVB) provides a useful way to determine the variance:

$$ V(\widehat{a}) = \sigma^2_{\widehat a} = \left[ \left( -\frac{\partial^2 \ln \mathcal{L}}{\partial a^2} \right) \right]_{\widehat a }^{-1} $$

In the case of multiple parameters:

$$ V(\widehat{a}{ij}) = \left[ -\frac{\partial^2 \ln \mathcal{L}}{\partial a_i \partial a_j} \right]^{-1}{\widehat{a}} $$