**On this page we are trying to get Firefox to print the entire page and not truncate it**

Created by Carl Schmertmann on 8 Sep 2010

This is an attempt to write a generic notation for Bayesian model fitting with least-squares fitting and priors, as in Girosi and King.

In this special framework:

- the likelihood L(
*Parameters*|*Data*) is normal - the posterior likelihood Post(
*Parameters*|*Data*) is normal - there is a closed-form analytical solution for the posterior mode
- the posterior mode and mean are identical

## PROBLEM

We have a vector of data $y \in \Re^K$, and a vector of parameters $\theta \in \Re^N$, where N > K in the case of a forecast. We want to find $\theta$ that fits *y*, **and** satisfies a set of P priors.

### Fit

$\theta$ fits the data *y* when $\Sigma^{-1/2}(G\theta - y)$ is "small".^{1} **G** is KxN, so there are K fitting objectives.

As an example, $\theta$ might be a complete set of N (true) past+future fertility rates, y might be the set of K (observed) past rates, and $G\,\theta$ might the subset of $\theta$ that refers to the past.

The quadratic penalty for fit is

(1)### Priors

$\theta$ satisfies prior p when the K_{p} x 1 vector $W_p^{1/2} D_p \,(A_p\theta - b_p)$ is small. **A _{p}** is a K

_{p}xN matrix (often just = I

_{N})

**W**is an K

_{p}_{p}xK

_{p}diagonal weight matrix (often =I

_{N}also),

**D**is an K

_{p}_{p}xK

_{p}matrix of conditions (e.g., based on derivatives), , and

**b**is an K

_{p}_{p}x1 target vector (often zero).

The quadratic penalty for prior p is

(2)and the total penalty for all priors is

(3)### Maximum A Posteriori (MAP) solution

Up to an additive constant, the log posterior at $\theta$ is

(4)which has derivative

(5)Abbreviate this as

(6)where $C_1 \equiv \sum \tau_p\,A_p^\prime\,D_p^\prime \,W_p\, D_p\,A_p$ and $c_2 \equiv \sum \tau_p\,A_p^\prime\,D_p^\prime \,W_p\, D_p\,b_p$ effectively summarize what we need to use from all the priors.

The *maximum a posteriori* (MAP) estimator is at

or even more simply $\theta = r + Sy$, where

(8)and

(9)Notice that in many problems all the *b _{p}* vectors will be zero, in which case r=0 and $\theta = Sy$.

### Posterior Distribution

Given the form of the log posterior, $(\theta | y)$ has a multivariate normal distribution, with mean vector

(10)and a covariance matrix

(11)This allows easily sampling from the log posterior — for example, to calculate confidence intervals for one or more components of $\theta$.