FF Print Issues

On this page we are trying to get Firefox to print the entire page and not truncate it

Created by Carl Schmertmann on 8 Sep 2010

This is an attempt to write a generic notation for Bayesian model fitting with least-squares fitting and priors, as in Girosi and King.

In this special framework:

  • the likelihood L(Parameters | Data) is normal
  • the posterior likelihood Post(Parameters | Data) is normal
  • there is a closed-form analytical solution for the posterior mode
  • the posterior mode and mean are identical


We have a vector of data $y \in \Re^K$, and a vector of parameters $\theta \in \Re^N$, where N > K in the case of a forecast. We want to find $\theta$ that fits y, and satisfies a set of P priors.


$\theta$ fits the data y when $\Sigma^{-1/2}(G\theta - y)$ is "small".1 G is KxN, so there are K fitting objectives.

As an example, $\theta$ might be a complete set of N (true) past+future fertility rates, y might be the set of K (observed) past rates, and $G\,\theta$ might the subset of $\theta$ that refers to the past.

The quadratic penalty for fit is

\begin{align} Q_{fit} \;= \;(G\theta - y)^{\prime} \; \Sigma^{-1}\; (G\theta - y) \end{align}


$\theta$ satisfies prior p when the Kp x 1 vector $W_p^{1/2} D_p \,(A_p\theta - b_p)$ is small. Ap is a KpxN matrix (often just = IN) Wp is an KpxKp diagonal weight matrix (often =IN also), Dp is an KpxKp matrix of conditions (e.g., based on derivatives), , and bp is an Kpx1 target vector (often zero).

The quadratic penalty for prior p is

\begin{align} Q_p \;= \tau_p \;(A_p\theta - b_p)^{\prime} \; D_p^\prime \,W_p\, D_p \; (A_p\theta - b_p) \end{align}

and the total penalty for all priors is

\begin{align} Q_{priors} \;= \sum_{p=1}^{P} \,\tau_p \;(A_p\theta - b_p)^{\prime} \; D_p^\prime \,W_p\, D_p \; (A_p\theta - b_p) \end{align}

Maximum A Posteriori (MAP) solution

Up to an additive constant, the log posterior at $\theta$ is

\begin{split} Post(\theta) \; = \; & (G\theta - y)^{\prime} \;\Sigma^{-1}\; (G\theta - y) \; \\ & + \; \sum_{p=1}^{P} \,\tau_p \;(A_p\theta - b_p)^{\prime} \; D_p^\prime \,W_p\, D_p \; (A_p\theta - b_p) \end{split}

which has derivative

\begin{split} \frac{\partial}{\partial\theta}Post(\theta) \;=\; & \left( G^\prime\Sigma^{-1} G + \sum \tau_p\,A_p^\prime\,D_p^\prime \,W_p\, D_p\,A_p \right) \; \theta \\ &- \left( G^\prime\Sigma^{-1} y + \sum \tau_p\,A_p^\prime\,D_p^\prime \,W_p\, D_p\,b_p \right) \end{split}

Abbreviate this as

\begin{align} \frac{\partial}{\partial\theta}Post(\theta) \;=\; & \left( G^\prime\Sigma^{-1} G + C_1 \right) \; \theta \\ - \left( G^\prime\Sigma^{-1} y + c_2 \right) \end{align}

where $C_1 \equiv \sum \tau_p\,A_p^\prime\,D_p^\prime \,W_p\, D_p\,A_p$ and $c_2 \equiv \sum \tau_p\,A_p^\prime\,D_p^\prime \,W_p\, D_p\,b_p$ effectively summarize what we need to use from all the priors.

The maximum a posteriori (MAP) estimator is at

\begin{align} \theta \;=\; \left( G^\prime\Sigma^{-1} G + C_1 \right)^{-1} \left( G^\prime\Sigma^{-1} y + c_2 \right) \end{align}

or even more simply $\theta = r + Sy$, where

\begin{align} r \;=\; \left( G^\prime \Sigma^{-1}G + C_1\right)^{-1}\\ c_2 \end{align}


\begin{align} S \;=\; \left( G^\prime \Sigma^{-1}G + C_1 \right)^{-1} \left( G^\prime \Sigma^{-1} \right) \end{align}

Notice that in many problems all the bp vectors will be zero, in which case r=0 and $\theta = Sy$.

Posterior Distribution

Given the form of the log posterior, $(\theta | y)$ has a multivariate normal distribution, with mean vector

\begin{align} \left( G^\prime \Sigma^{-1}G + C_1 \right)^{-1} \left( G^\prime \Sigma^{-1}Gy + c_2 \right) \end{align}

and a covariance matrix

\begin{align} \left( G^\prime \Sigma^{-1}G + C_1 \right)^{-1} \end{align}

This allows easily sampling from the log posterior — for example, to calculate confidence intervals for one or more components of $\theta$.

Site Design, Syntax and Examples by Rob Elliott 2008-2016