Precision Matrix and Jacobian Matrix Contribution

January 17, 2025 6 minute read

1) What is $J$?

In GTSAM (and factor-graph methods in general), each “error function” $error(x)$ is often linearized around some operating point $\hat{x}$. For a factor that depends on state $x$, we can write:

\[error(x) \approx error(\hat{x}) + J(x - \hat{x}),\]

where:

$error(\hat{x})$ is the residual at the linearization point,
$J$ is the Jacobian matrix of partial derivatives of $error$ with respect to $x$, evaluated at $\hat{x}$.

In other words, $J$ is literally the matrix of first-order derivatives (slopes) that map small changes in $x$ to changes in the factor’s error. GTSAM’s JacobianFactor stores this linear approximation $error(\hat{x}) + J(\delta x)$ internally. When multiplied by the factor’s square-root information $\Lambda^{1/2}$, you get the final row(s) in the system $F x = d$.

2) Numerical Derivation of the Square-Root Form

a) Least-Squares Formulation

Consider a general Gaussian factor with measurement $z$, predicted by some function $h(x)$. The covariance of the measurement noise is $\Sigma$. The negative log-likelihood of a single factor is:

\[\frac{1}{2} \left( h(x) - z \right)^T \Sigma^{-1} \left( h(x) - z \right).\]

If we define $\Lambda = \Sigma^{-1}$ (the information matrix), then the cost becomes:

\[\frac{1}{2} \left( h(x) - z \right)^T \Lambda \left( h(x) - z \right).\]

Minimizing this is equivalent to a least-squares problem.

b) Taking the Square-Root of $\Lambda$

We can rewrite $\Lambda$ as $\Lambda^{1/2} \Lambda^{1/2}$. Let

\[error(x) = h(x) - z.\]

Then the cost is:

\[\frac{1}{2} \left\| \Lambda^{1/2} error(x) \right\|^2.\]

Minimizing $\frac{1}{2} \left| \Lambda^{1/2} error(x) \right|^2$ is exactly the same as minimizing $\frac{1}{2} error(x)^T \Lambda error(x)$. But written as a norm,

\[\Lambda^{1/2} error(x) = 0\]

is the condition for the best fit.

Hence the factor in “square-root” form is simply the row:

\[\Lambda^{1/2} J x - \Lambda^{1/2} \left[ z - h(\hat{x}) + J \hat{x} \right] = 0,\]

once linearized about $\hat{x}$. That is precisely what GTSAM stores in its JacobianFactor: the matrix part $\Lambda^{1/2} J$ and the vector part $\Lambda^{1/2} \left[ z - h(\hat{x}) + J \hat{x} \right]$.

c) Simple 1D Numerical Example

To see it more concretely, imagine a single variable $x \in \mathbb{R}$ with a measurement $z = 5$ and noise variance $\sigma^2 = 4$. Then $\Sigma = 4$, so $\Lambda = \Sigma^{-1} = 1/4$. Its square-root is $\Lambda^{1/2} = 1/2$.

The least-squares cost is:

\[\frac{1}{2} \left( x - 5 \right)^T \cdot \frac{1}{4} \cdot \left( x - 5 \right).\]

Written in square-root form:

\[\frac{1}{2} \left\| \frac{1}{2} \left( x - 5 \right) \right\|^2.\]

The “factor row” is:

\[\left[ \frac{1}{2} \right] (x - 5) = 0\]

or equivalently:

\[\frac{1}{2} x - \frac{5}{2} = 0.\]

Hence in matrix form, that factor is $\left[ \frac{1}{2} \right] x = \left[ \frac{5}{2} \right]$. Minimizing the norm of that row recovers $x = 5$. Notice that the “coefficient” is the square-root precision $\frac{1}{2}$, not $\frac{1}{4}$.

Example with GTSAM

1) Why the “Square-Root” Form?

Better Numerical Conditioning
In least-squares problems, one can write factors either in “information” form or “square-root information” form. Using the square-root of the information matrix (i.e., $\Lambda^{1/2}$) leads to smaller condition numbers in practice and tends to be more stable when doing incremental factorization (QR, Cholesky) in a factor-graph solver.
Direct Interpretation
Each factor in GTSAM is conceptually
$\sqrt{\Lambda}\,\bigl(\text{error}(x)\bigr) \;=\;0,$
where $\Lambda$ is the precision (inverse covariance). Hence the JacobianFactor simply stores $\sqrt{\Lambda}\,J$ as the rows in $F$, and $\sqrt{\Lambda}\,(\text{measured value})$ as the entries in $d$.
Consistency with QR
The final solution uses a QR or Cholesky factorization. Keeping everything in “square-root” form means we can factor $F$ directly into an upper-triangular $R$ (plus orthonormal transforms), rather than first having to multiply out $F^\top F$.

2) Updated Numbers for the Example

In the code:

model2 = noiseModel.Isotropic.Sigma(2, 0.5)  # for the priors/measurements
motion_model = noiseModel.Diagonal.Sigmas([0.1, 0.3])  # for the motions

the standard deviations are:

$\sigma=0.5$ for each prior/measurement in 2D
$\sigma_x=0.1$ and $\sigma_y=0.3$ for each motion factor

Hence, their square‐root precisions are:

$\sqrt{\Lambda_{\text{meas}}}=1/\sigma=2.$
$\sqrt{\Lambda_{\text{motion}}}=\begin{matrix}10 & 0 \ 0 & 3.333\ldots\end{matrix}$ (since $1/0.1=10$ and $1/0.3\approx 3.333\ldots$).

a) Three 2D States

Let
$x \;=\; \bigl[ x_{1,1},\,x_{1,2},\,x_{2,1},\,x_{2,2},\,x_{3,1},\,x_{3,2} \bigr]^\top.$

b) Priors/“Measurements”

From your loop:

for i, key in enumerate([x1, x2, x3]):
    b = gtsam.Point2(i*2, 0)
    # ...

the “measured” positions are:

$\mathbf{x}_1 \approx (0,0)$
$\mathbf{x}_2 \approx (2,0)$
$\mathbf{x}_3 \approx (4,0)$

Since the noise model is $\sigma=0.5$, the factor is
$\sqrt{\Lambda}(\mathbf{x}_i - \mathbf{z}_i)=\mathbf{0}$,
with $\sqrt{\Lambda}=2$.

Hence, each prior contributes two rows:

Prior on $\mathbf{x}_1\approx(0,0)$

\[2\,(x_{1,1}-0)=0,\quad 2\,(x_{1,2}-0)=0 \quad\Longrightarrow\quad \begin{bmatrix} 2 & 0 & 0 & 0 & 0 & 0\\[6pt] 0 & 2 & 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix}x_{1,1}\\x_{1,2}\\\dots\\x_{3,2}\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix}.\]

Prior on $\mathbf{x}_2\approx(2,0)$

$2\,(x_{2,1}-2)=0\;\;\to\;d=4,\quad 2\,(x_{2,2}-0)=0\;\;\to\;d=0,$ so the matrix rows become $\begin{bmatrix} 0 & 0 & 2 & 0 & 0 & 0\\[6pt] 0 & 0 & 0 & 2 & 0 & 0 \end{bmatrix}, \quad d=\begin{bmatrix}4\\0\end{bmatrix}.$

Prior on $\mathbf{x}_3\approx(4,0)$

$2\,(x_{3,1}-4)=0\;\;\to d=8,\quad 2\,(x_{3,2}-0)=0\;\;\to d=0,$ so $\begin{bmatrix} 0 & 0 & 0 & 0 & 2 & 0\\[6pt] 0 & 0 & 0 & 0 & 0 & 2 \end{bmatrix}, \quad d=\begin{bmatrix}8\\0\end{bmatrix}.$

c) Motion Factors

You have two motion factors:

$\mathbf{x}_2-\mathbf{x}_1\approx(2,0)$ with $\sigma_x=0.1,\,\sigma_y=0.3,$ so $\sqrt{\Lambda}=\mathrm{diag}(10,3.333\ldots)$.
- For the (x)‐row:
  $10\,[(x_{2,1}-x_{1,1}) - 2]=0 \;\Rightarrow\; \text{Row}=[-10,\,0,\,+10,\,0,\,0,\,0],\;d=10\cdot 2=20.$
- For the (y)‐row:
  $3.333\ldots\,[(x_{2,2}-x_{1,2}) - 0]=0 \;\Rightarrow\;\text{Row}=[0,\,-3.333,\;0,\,+3.333,\;0,\;0],\; d=0.$
$\mathbf{x}_3-\mathbf{x}_2\approx(2,0)$ with the same diagonal sqrt‐info (\mathrm{diag}(10,3.333)).
- $x$-dimension:
  $[-10,\,0,\,+10,\,0]\;\to\;[-10,0,+10,0] \text{ but for }(x_2,x_3) \Rightarrow [0,0,-10,0,\,+10,0],\quad d=20.$
- $y$-dimension:
  $[0,\,-3.333,\,+3.333]\;\Rightarrow [0,0,0,-3.333,0,+3.333],\; d=0.$

Putting it all together yields a $10\times 6$ system $F\,x=d$ whose rows come directly from $\sqrt{\Lambda}\times(\mathbf{x}-\mathbf{z})=0$. That is the square‐root form.

3) How $F\,x=d$ Turns into $R\,x=d$

When you call

gbn = gfg.eliminateSequential()
R, d = gbn.matrix()

GTSAM runs a sequential elimination procedure (mathematically akin to QR). In effect, it pivots out the states $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3$ one by one, applying orthonormal transformations to keep the system upper‐triangular. The final result is an $6\times 6$ matrix $R$ such that $R\,x = d.$ The off‐diagonal entries in $R$ (things like $-9.81$, etc.) are simply the numerical byproducts of these orthonormal transformations.

Thus:

Each factor (prior or motion) supplies a row (or two rows) of the “square‐root” system $F$.
Stack them to get $F\,x=d$.
Factor $F\to R$ by sequential elimination.

The $d$ above is the “square root information vector”, as the Bayes net defines the following quadratic, and corresponding Gaussian density, in this case in $\mathbb{R}^6$:

\[E_{gbn}(x; R, d) \doteq \frac{1}{2} \|Rx - d\|^2\] \[\mathcal{N}_{gbn}(x; R, d) \sim k \exp \{- \frac{1}{2} \|Rx - d\|^2\}\]

One way to see why $R$ is the square‐root of the Gaussian’s information matrix is to look at how GTSAM’s factorization aligns with the standard least‐squares cost:

Start with the Gaussian cost (information form).
For a multivariate Gaussian in $\mathbb{R}^n$, the negative‐log probability can be written as
$\tfrac12\,(x-\mu)^\top\,\Sigma^{-1}\,(x-\mu).$ The matrix $\Sigma^{-1}$ is the information matrix.
Rewrite it in “square‐root” form.
We can factor $\Sigma^{-1}$ as
$\Sigma^{-1} \;=\; R^\top\,R,$ where $R$ is an upper‐triangular matrix—this is analogous to a Cholesky (or $\mathrm{QR}$) factor. Then
$(x-\mu)^\top\,\Sigma^{-1}\,(x-\mu) \;=\;\bigl\|\,R\,(x-\mu)\bigr\|^2.$ Thus we see that (R) acts like the “square‐root” of $\Sigma^{-1}$.
GTSAM’s final factorization yields $\tfrac12\,|R\,x - d|^2.$
After GTSAM does sequential elimination, it ends up with a reduced system of the form $\|R\,x - d\|^2,$ which is exactly the same expression as $|(x-\mu)|_{\Sigma^{-1}}^2 = (x-\mu)^\top\,\Sigma^{-1}\,(x-\mu)$ up to a shift and re‐labeling $\mu = R^{-1}\,d$. Expanding $|R\,x - d|^2$ gives $x^\top\,R^\top\,R\,x \;-\;2\,d^\top\,R\,x \;+\;\ldots$ and we see $R^\top R$ playing the role of the information matrix.
Dimension 6 is because there are 3 two‐dimensional variables.
In the simple Kalman‐smoother example with 3 states $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3$, each state is 2D. Altogether $x\in\mathbb{R}^6$. Hence the final $R$ is $6\times 6$, and $R^\top R$ is the full $6\times 6$ information matrix for that joint Gaussian.

In short, you know that $R$ is the square‐root of the information because the cost function in GTSAM after elimination is exactly $\tfrac12\,|R\,x - d|^2$. When you expand that norm, the quadratic term is
$x^\top\,(R^\top R)\,x$ which must match the original cost’s $\Sigma^{-1}$. Therefore $R^\top R = \Sigma^{-1}$, making $R$ the “square‐root” information factor.

Share on

Twitter Facebook LinkedIn

Jungseok Hong