Precision Matrix and Jacobian Matrix Contribution

6 minute read

1) What is $J$?

In GTSAM (and factor-graph methods in general), each “error function” $error(x)$ is often linearized around some operating point $\hat{x}$. For a factor that depends on state $x$, we can write:

\[error(x) \approx error(\hat{x}) + J(x - \hat{x}),\]

where:

  • $error(\hat{x})$ is the residual at the linearization point,
  • $J$ is the Jacobian matrix of partial derivatives of $error$ with respect to $x$, evaluated at $\hat{x}$.

In other words, $J$ is literally the matrix of first-order derivatives (slopes) that map small changes in $x$ to changes in the factor’s error. GTSAM’s JacobianFactor stores this linear approximation $error(\hat{x}) + J(\delta x)$ internally. When multiplied by the factor’s square-root information $\Lambda^{1/2}$, you get the final row(s) in the system $F x = d$.


2) Numerical Derivation of the Square-Root Form

a) Least-Squares Formulation

Consider a general Gaussian factor with measurement $z$, predicted by some function $h(x)$. The covariance of the measurement noise is $\Sigma$. The negative log-likelihood of a single factor is:

\[\frac{1}{2} \left( h(x) - z \right)^T \Sigma^{-1} \left( h(x) - z \right).\]

If we define $\Lambda = \Sigma^{-1}$ (the information matrix), then the cost becomes:

\[\frac{1}{2} \left( h(x) - z \right)^T \Lambda \left( h(x) - z \right).\]

Minimizing this is equivalent to a least-squares problem.


b) Taking the Square-Root of $\Lambda$

We can rewrite $\Lambda$ as $\Lambda^{1/2} \Lambda^{1/2}$. Let

\[error(x) = h(x) - z.\]

Then the cost is:

\[\frac{1}{2} \left\| \Lambda^{1/2} error(x) \right\|^2.\]

Minimizing $\frac{1}{2} \left| \Lambda^{1/2} error(x) \right|^2$ is exactly the same as minimizing $\frac{1}{2} error(x)^T \Lambda error(x)$. But written as a norm,

\[\Lambda^{1/2} error(x) = 0\]

is the condition for the best fit.

Hence the factor in “square-root” form is simply the row:

\[\Lambda^{1/2} J x - \Lambda^{1/2} \left[ z - h(\hat{x}) + J \hat{x} \right] = 0,\]

once linearized about $\hat{x}$. That is precisely what GTSAM stores in its JacobianFactor: the matrix part $\Lambda^{1/2} J$ and the vector part $\Lambda^{1/2} \left[ z - h(\hat{x}) + J \hat{x} \right]$.


c) Simple 1D Numerical Example

To see it more concretely, imagine a single variable $x \in \mathbb{R}$ with a measurement $z = 5$ and noise variance $\sigma^2 = 4$. Then $\Sigma = 4$, so $\Lambda = \Sigma^{-1} = 1/4$. Its square-root is $\Lambda^{1/2} = 1/2$.

The least-squares cost is:

\[\frac{1}{2} \left( x - 5 \right)^T \cdot \frac{1}{4} \cdot \left( x - 5 \right).\]

Written in square-root form:

\[\frac{1}{2} \left\| \frac{1}{2} \left( x - 5 \right) \right\|^2.\]

The “factor row” is:

\[\left[ \frac{1}{2} \right] (x - 5) = 0\]

or equivalently:

\[\frac{1}{2} x - \frac{5}{2} = 0.\]

Hence in matrix form, that factor is $\left[ \frac{1}{2} \right] x = \left[ \frac{5}{2} \right]$. Minimizing the norm of that row recovers $x = 5$. Notice that the “coefficient” is the square-root precision $\frac{1}{2}$, not $\frac{1}{4}$.


Example with GTSAM

1) Why the “Square-Root” Form?

  1. Better Numerical Conditioning
    In least-squares problems, one can write factors either in “information” form or “square-root information” form. Using the square-root of the information matrix (i.e., $\Lambda^{1/2}$) leads to smaller condition numbers in practice and tends to be more stable when doing incremental factorization (QR, Cholesky) in a factor-graph solver.

  2. Direct Interpretation
    Each factor in GTSAM is conceptually
    \(\sqrt{\Lambda}\,\bigl(\text{error}(x)\bigr) \;=\;0,\)
    where $\Lambda$ is the precision (inverse covariance). Hence the JacobianFactor simply stores $\sqrt{\Lambda}\,J$ as the rows in $F$, and $\sqrt{\Lambda}\,(\text{measured value})$ as the entries in $d$.

  3. Consistency with QR
    The final solution uses a QR or Cholesky factorization. Keeping everything in “square-root” form means we can factor $F$ directly into an upper-triangular $R$ (plus orthonormal transforms), rather than first having to multiply out $F^\top F$.


2) Updated Numbers for the Example

In the code:

model2 = noiseModel.Isotropic.Sigma(2, 0.5)  # for the priors/measurements
motion_model = noiseModel.Diagonal.Sigmas([0.1, 0.3])  # for the motions

the standard deviations are:

  • $\sigma=0.5$ for each prior/measurement in 2D
  • $\sigma_x=0.1$ and $\sigma_y=0.3$ for each motion factor

Hence, their square‐root precisions are:

  • $\sqrt{\Lambda_{\text{meas}}}=1/\sigma=2.$
  • $\sqrt{\Lambda_{\text{motion}}}=\begin{matrix}10 & 0 \ 0 & 3.333\ldots\end{matrix}$ (since $1/0.1=10$ and $1/0.3\approx 3.333\ldots$).

a) Three 2D States

Let
\(x \;=\; \bigl[ x_{1,1},\,x_{1,2},\,x_{2,1},\,x_{2,2},\,x_{3,1},\,x_{3,2} \bigr]^\top.\)

b) Priors/“Measurements”

From your loop:

for i, key in enumerate([x1, x2, x3]):
    b = gtsam.Point2(i*2, 0)
    # ...

the “measured” positions are:

  • $\mathbf{x}_1 \approx (0,0)$
  • $\mathbf{x}_2 \approx (2,0)$
  • $\mathbf{x}_3 \approx (4,0)$

Since the noise model is $\sigma=0.5$, the factor is
$\sqrt{\Lambda}(\mathbf{x}_i - \mathbf{z}_i)=\mathbf{0}$,
with $\sqrt{\Lambda}=2$.

Hence, each prior contributes two rows:

  1. Prior on $\mathbf{x}_1\approx(0,0)$
\[2\,(x_{1,1}-0)=0,\quad 2\,(x_{1,2}-0)=0 \quad\Longrightarrow\quad \begin{bmatrix} 2 & 0 & 0 & 0 & 0 & 0\\[6pt] 0 & 2 & 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix}x_{1,1}\\x_{1,2}\\\dots\\x_{3,2}\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix}.\]
  1. Prior on $\mathbf{x}_2\approx(2,0)$

\(2\,(x_{2,1}-2)=0\;\;\to\;d=4,\quad 2\,(x_{2,2}-0)=0\;\;\to\;d=0,\) so the matrix rows become \(\begin{bmatrix} 0 & 0 & 2 & 0 & 0 & 0\\[6pt] 0 & 0 & 0 & 2 & 0 & 0 \end{bmatrix}, \quad d=\begin{bmatrix}4\\0\end{bmatrix}.\)

  1. Prior on $\mathbf{x}_3\approx(4,0)$

\(2\,(x_{3,1}-4)=0\;\;\to d=8,\quad 2\,(x_{3,2}-0)=0\;\;\to d=0,\) so \(\begin{bmatrix} 0 & 0 & 0 & 0 & 2 & 0\\[6pt] 0 & 0 & 0 & 0 & 0 & 2 \end{bmatrix}, \quad d=\begin{bmatrix}8\\0\end{bmatrix}.\)

c) Motion Factors

You have two motion factors:

  1. $\mathbf{x}_2-\mathbf{x}_1\approx(2,0)$ with $\sigma_x=0.1,\,\sigma_y=0.3,$ so $\sqrt{\Lambda}=\mathrm{diag}(10,3.333\ldots)$.
    • For the (x)‐row:
      \(10\,[(x_{2,1}-x_{1,1}) - 2]=0 \;\Rightarrow\; \text{Row}=[-10,\,0,\,+10,\,0,\,0,\,0],\;d=10\cdot 2=20.\)
    • For the (y)‐row:
      \(3.333\ldots\,[(x_{2,2}-x_{1,2}) - 0]=0 \;\Rightarrow\;\text{Row}=[0,\,-3.333,\;0,\,+3.333,\;0,\;0],\; d=0.\)
  2. $\mathbf{x}_3-\mathbf{x}_2\approx(2,0)$ with the same diagonal sqrt‐info (\mathrm{diag}(10,3.333)).
    • $x$-dimension:
      \([-10,\,0,\,+10,\,0]\;\to\;[-10,0,+10,0] \text{ but for }(x_2,x_3) \Rightarrow [0,0,-10,0,\,+10,0],\quad d=20.\)
    • $y$-dimension:
      \([0,\,-3.333,\,+3.333]\;\Rightarrow [0,0,0,-3.333,0,+3.333],\; d=0.\)

Putting it all together yields a $10\times 6$ system $F\,x=d$ whose rows come directly from $\sqrt{\Lambda}\times(\mathbf{x}-\mathbf{z})=0$. That is the square‐root form.


3) How $F\,x=d$ Turns into $R\,x=d$

When you call

gbn = gfg.eliminateSequential()
R, d = gbn.matrix()

GTSAM runs a sequential elimination procedure (mathematically akin to QR). In effect, it pivots out the states $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3$ one by one, applying orthonormal transformations to keep the system upper‐triangular. The final result is an $6\times 6$ matrix $R$ such that \(R\,x = d.\) The off‐diagonal entries in $R$ (things like $-9.81$, etc.) are simply the numerical byproducts of these orthonormal transformations.

Thus:

  1. Each factor (prior or motion) supplies a row (or two rows) of the “square‐root” system $F$.
  2. Stack them to get $F\,x=d$.
  3. Factor $F\to R$ by sequential elimination.

The $d$ above is the “square root information vector”, as the Bayes net defines the following quadratic, and corresponding Gaussian density, in this case in $\mathbb{R}^6$:

\[E_{gbn}(x; R, d) \doteq \frac{1}{2} \|Rx - d\|^2\] \[\mathcal{N}_{gbn}(x; R, d) \sim k \exp \{- \frac{1}{2} \|Rx - d\|^2\}\]

One way to see why $R$ is the square‐root of the Gaussian’s information matrix is to look at how GTSAM’s factorization aligns with the standard least‐squares cost:

  1. Start with the Gaussian cost (information form).
    For a multivariate Gaussian in $\mathbb{R}^n$, the negative‐log probability can be written as
    \(\tfrac12\,(x-\mu)^\top\,\Sigma^{-1}\,(x-\mu).\) The matrix $\Sigma^{-1}$ is the information matrix.

  2. Rewrite it in “square‐root” form.
    We can factor $\Sigma^{-1}$ as
    \(\Sigma^{-1} \;=\; R^\top\,R,\) where $R$ is an upper‐triangular matrix—this is analogous to a Cholesky (or $\mathrm{QR}$) factor. Then
    \((x-\mu)^\top\,\Sigma^{-1}\,(x-\mu) \;=\;\bigl\|\,R\,(x-\mu)\bigr\|^2.\) Thus we see that (R) acts like the “square‐root” of $\Sigma^{-1}$.

  3. GTSAM’s final factorization yields $\tfrac12\,|R\,x - d|^2.$
    After GTSAM does sequential elimination, it ends up with a reduced system of the form \(\|R\,x - d\|^2,\) which is exactly the same expression as $|(x-\mu)|_{\Sigma^{-1}}^2 = (x-\mu)^\top\,\Sigma^{-1}\,(x-\mu)$ up to a shift and re‐labeling $\mu = R^{-1}\,d$. Expanding $|R\,x - d|^2$ gives \(x^\top\,R^\top\,R\,x \;-\;2\,d^\top\,R\,x \;+\;\ldots\) and we see $R^\top R$ playing the role of the information matrix.

  4. Dimension 6 is because there are 3 two‐dimensional variables.
    In the simple Kalman‐smoother example with 3 states $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3$, each state is 2D. Altogether $x\in\mathbb{R}^6$. Hence the final $R$ is $6\times 6$, and $R^\top R$ is the full $6\times 6$ information matrix for that joint Gaussian.

In short, you know that $R$ is the square‐root of the information because the cost function in GTSAM after elimination is exactly $\tfrac12\,|R\,x - d|^2$. When you expand that norm, the quadratic term is
\(x^\top\,(R^\top R)\,x\) which must match the original cost’s $\Sigma^{-1}$. Therefore $R^\top R = \Sigma^{-1}$, making $R$ the “square‐root” information factor.