<aside> <img src="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/bf1e9653-2854-4a74-9b6e-f415615c68ce/bayess_theorem.png" alt="https://s3-us-west-2.amazonaws.com/secure.notion-static.com/bf1e9653-2854-4a74-9b6e-f415615c68ce/bayess_theorem.png" width="40px" /> Bayes’ theorem:
$$ \begin{aligned} P(a \mid x) &= \frac{I(a) \mathcal{L}(x \mid a)}{E(x)} \\ &= \frac{I(a) \prod \mathcal{L}(x_i \mid a)}{E(x)} \end{aligned} $$
with:
$$ \begin{aligned} E(x) &= \sum_i I(a_i) \mathcal{L}(x \mid a_i) \\ &= \int I(a) L(x \mid a) \, {\rm d}a \end{aligned} $$
$$ \begin{aligned} \widehat{a}{\rm MAP} &= \arg \max{\substack a} P(a \mid x) \equiv \left[ \frac{\partial P(a \mid x)}{\partial a} \right]{a = \widehat{a}} = 0\\ &= \arg \max{\substack a} \ln P(a \mid x) \equiv \left[ \frac{\partial \ln P(a \mid x)}{\partial a} \right]_{a = \widehat{a}} = 0 \end{aligned} $$
$$ \begin{aligned} \widehat{a}{\rm MLE} &= \arg \max{\substack a} \mathcal{L}(a \mid x) \equiv \left[ \frac{\partial \mathcal{L}(a \mid x)}{\partial a} \right]{a = \widehat{a}} = 0\\ &= \arg \max{\substack a} \ln \mathcal{L}(a \mid x) \equiv \left[ \frac{\partial \ln \mathcal{L}(a \mid x)}{\partial a} \right]_{a = \widehat{a}} = 0 \,. \end{aligned} $$
The MLE can be thought of as a good estimator in most cases:
- Generally (though not strictly always) consistent
- Generally biased, but this disappears in the limit of large $N$ for any consistent estimator
- Generally invariant
- Generally efficient
$$ \widehat{f(a)}=f(\widehat a) $$
$$ V(\widehat{a}) = \sigma^2_{\widehat a} = \left[ \left( -\frac{\partial^2 \ln \mathcal{L}}{\partial a^2} \right) \right]_{\widehat a }^{-1} $$
$$ V(\widehat{a}{ij}) = \left[ -\frac{\partial^2 \ln \mathcal{L}}{\partial a_i \partial a_j} \right]^{-1}{\widehat{a}} $$