Precision Matrix and Jacobian Matrix Contribution
1) What is $J$?
In GTSAM (and factor-graph methods in general), each “error function” $error(x)$ is often linearized around some operating point $\hat{x}$. For a factor that depends on state $x$, we can write:
\[error(x) \approx error(\hat{x}) + J(x - \hat{x}),\]where:
- $error(\hat{x})$ is the residual at the linearization point,
- $J$ is the Jacobian matrix of partial derivatives of $error$ with respect to $x$, evaluated at $\hat{x}$.
In other words, $J$ is literally the matrix of first-order derivatives (slopes) that map small changes in $x$ to changes in the factor’s error. GTSAM’s JacobianFactor stores this linear approximation $error(\hat{x}) + J(\delta x)$ internally. When multiplied by the factor’s square-root information $\Lambda^{1/2}$, you get the final row(s) in the system $F x = d$.
2) Numerical Derivation of the Square-Root Form
a) Least-Squares Formulation
Consider a general Gaussian factor with measurement $z$, predicted by some function $h(x)$. The covariance of the measurement noise is $\Sigma$. The negative log-likelihood of a single factor is:
\[\frac{1}{2} \left( h(x) - z \right)^T \Sigma^{-1} \left( h(x) - z \right).\]If we define $\Lambda = \Sigma^{-1}$ (the information matrix), then the cost becomes:
\[\frac{1}{2} \left( h(x) - z \right)^T \Lambda \left( h(x) - z \right).\]Minimizing this is equivalent to a least-squares problem.
b) Taking the Square-Root of $\Lambda$
We can rewrite $\Lambda$ as $\Lambda^{1/2} \Lambda^{1/2}$. Let
\[error(x) = h(x) - z.\]Then the cost is:
\[\frac{1}{2} \left\| \Lambda^{1/2} error(x) \right\|^2.\]Minimizing $\frac{1}{2} \left| \Lambda^{1/2} error(x) \right|^2$ is exactly the same as minimizing $\frac{1}{2} error(x)^T \Lambda error(x)$. But written as a norm,
\[\Lambda^{1/2} error(x) = 0\]is the condition for the best fit.
Hence the factor in “square-root” form is simply the row:
\[\Lambda^{1/2} J x - \Lambda^{1/2} \left[ z - h(\hat{x}) + J \hat{x} \right] = 0,\]once linearized about $\hat{x}$. That is precisely what GTSAM stores in its JacobianFactor: the matrix part $\Lambda^{1/2} J$ and the vector part $\Lambda^{1/2} \left[ z - h(\hat{x}) + J \hat{x} \right]$.
c) Simple 1D Numerical Example
To see it more concretely, imagine a single variable $x \in \mathbb{R}$ with a measurement $z = 5$ and noise variance $\sigma^2 = 4$. Then $\Sigma = 4$, so $\Lambda = \Sigma^{-1} = 1/4$. Its square-root is $\Lambda^{1/2} = 1/2$.
The least-squares cost is:
\[\frac{1}{2} \left( x - 5 \right)^T \cdot \frac{1}{4} \cdot \left( x - 5 \right).\]Written in square-root form:
\[\frac{1}{2} \left\| \frac{1}{2} \left( x - 5 \right) \right\|^2.\]The “factor row” is:
\[\left[ \frac{1}{2} \right] (x - 5) = 0\]or equivalently:
\[\frac{1}{2} x - \frac{5}{2} = 0.\]Hence in matrix form, that factor is $\left[ \frac{1}{2} \right] x = \left[ \frac{5}{2} \right]$. Minimizing the norm of that row recovers $x = 5$. Notice that the “coefficient” is the square-root precision $\frac{1}{2}$, not $\frac{1}{4}$.
Example with GTSAM
1) Why the “Square-Root” Form?
- 
    Better Numerical Conditioning 
 In least-squares problems, one can write factors either in “information” form or “square-root information” form. Using the square-root of the information matrix (i.e., $\Lambda^{1/2}$) leads to smaller condition numbers in practice and tends to be more stable when doing incremental factorization (QR, Cholesky) in a factor-graph solver.
- 
    Direct Interpretation 
 Each factor in GTSAM is conceptually
 \(\sqrt{\Lambda}\,\bigl(\text{error}(x)\bigr) \;=\;0,\)
 where $\Lambda$ is the precision (inverse covariance). Hence the JacobianFactor simply stores $\sqrt{\Lambda}\,J$ as the rows in $F$, and $\sqrt{\Lambda}\,(\text{measured value})$ as the entries in $d$.
- 
    Consistency with QR 
 The final solution uses a QR or Cholesky factorization. Keeping everything in “square-root” form means we can factor $F$ directly into an upper-triangular $R$ (plus orthonormal transforms), rather than first having to multiply out $F^\top F$.
2) Updated Numbers for the Example
In the code:
model2 = noiseModel.Isotropic.Sigma(2, 0.5)  # for the priors/measurements
motion_model = noiseModel.Diagonal.Sigmas([0.1, 0.3])  # for the motions
the standard deviations are:
- $\sigma=0.5$ for each prior/measurement in 2D
- $\sigma_x=0.1$ and $\sigma_y=0.3$ for each motion factor
Hence, their square‐root precisions are:
- $\sqrt{\Lambda_{\text{meas}}}=1/\sigma=2.$
- $\sqrt{\Lambda_{\text{motion}}}=\begin{matrix}10 & 0 \ 0 & 3.333\ldots\end{matrix}$ (since $1/0.1=10$ and $1/0.3\approx 3.333\ldots$).
a) Three 2D States
Let
\(x \;=\;
\bigl[
x_{1,1},\,x_{1,2},\,x_{2,1},\,x_{2,2},\,x_{3,1},\,x_{3,2}
\bigr]^\top.\)
b) Priors/“Measurements”
From your loop:
for i, key in enumerate([x1, x2, x3]):
    b = gtsam.Point2(i*2, 0)
    # ...
the “measured” positions are:
- $\mathbf{x}_1 \approx (0,0)$
- $\mathbf{x}_2 \approx (2,0)$
- $\mathbf{x}_3 \approx (4,0)$
Since the noise model is $\sigma=0.5$, the factor is
$\sqrt{\Lambda}(\mathbf{x}_i - \mathbf{z}_i)=\mathbf{0}$,
with $\sqrt{\Lambda}=2$.
Hence, each prior contributes two rows:
- Prior on $\mathbf{x}_1\approx(0,0)$
- Prior on $\mathbf{x}_2\approx(2,0)$
\(2\,(x_{2,1}-2)=0\;\;\to\;d=4,\quad 2\,(x_{2,2}-0)=0\;\;\to\;d=0,\) so the matrix rows become \(\begin{bmatrix} 0 & 0 & 2 & 0 & 0 & 0\\[6pt] 0 & 0 & 0 & 2 & 0 & 0 \end{bmatrix}, \quad d=\begin{bmatrix}4\\0\end{bmatrix}.\)
- Prior on $\mathbf{x}_3\approx(4,0)$
\(2\,(x_{3,1}-4)=0\;\;\to d=8,\quad 2\,(x_{3,2}-0)=0\;\;\to d=0,\) so \(\begin{bmatrix} 0 & 0 & 0 & 0 & 2 & 0\\[6pt] 0 & 0 & 0 & 0 & 0 & 2 \end{bmatrix}, \quad d=\begin{bmatrix}8\\0\end{bmatrix}.\)
c) Motion Factors
You have two motion factors:
- $\mathbf{x}_2-\mathbf{x}_1\approx(2,0)$
with $\sigma_x=0.1,\,\sigma_y=0.3,$
so $\sqrt{\Lambda}=\mathrm{diag}(10,3.333\ldots)$.
    - For the (x)‐row:
 \(10\,[(x_{2,1}-x_{1,1}) - 2]=0 \;\Rightarrow\; \text{Row}=[-10,\,0,\,+10,\,0,\,0,\,0],\;d=10\cdot 2=20.\)
- For the (y)‐row:
 \(3.333\ldots\,[(x_{2,2}-x_{1,2}) - 0]=0 \;\Rightarrow\;\text{Row}=[0,\,-3.333,\;0,\,+3.333,\;0,\;0],\; d=0.\)
 
- For the (x)‐row:
- $\mathbf{x}_3-\mathbf{x}_2\approx(2,0)$
with the same diagonal sqrt‐info (\mathrm{diag}(10,3.333)).
    - $x$-dimension:
 \([-10,\,0,\,+10,\,0]\;\to\;[-10,0,+10,0] \text{ but for }(x_2,x_3) \Rightarrow [0,0,-10,0,\,+10,0],\quad d=20.\)
- $y$-dimension:
 \([0,\,-3.333,\,+3.333]\;\Rightarrow [0,0,0,-3.333,0,+3.333],\; d=0.\)
 
- $x$-dimension:
Putting it all together yields a $10\times 6$ system $F\,x=d$ whose rows come directly from $\sqrt{\Lambda}\times(\mathbf{x}-\mathbf{z})=0$. That is the square‐root form.
3) How $F\,x=d$ Turns into $R\,x=d$
When you call
gbn = gfg.eliminateSequential()
R, d = gbn.matrix()
GTSAM runs a sequential elimination procedure (mathematically akin to QR). In effect, it pivots out the states $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3$ one by one, applying orthonormal transformations to keep the system upper‐triangular. The final result is an $6\times 6$ matrix $R$ such that \(R\,x = d.\) The off‐diagonal entries in $R$ (things like $-9.81$, etc.) are simply the numerical byproducts of these orthonormal transformations.
Thus:
- Each factor (prior or motion) supplies a row (or two rows) of the “square‐root” system $F$.
- Stack them to get $F\,x=d$.
- Factor $F\to R$ by sequential elimination.
The $d$ above is the “square root information vector”, as the Bayes net defines the following quadratic, and corresponding Gaussian density, in this case in $\mathbb{R}^6$:
\[E_{gbn}(x; R, d) \doteq \frac{1}{2} \|Rx - d\|^2\] \[\mathcal{N}_{gbn}(x; R, d) \sim k \exp \{- \frac{1}{2} \|Rx - d\|^2\}\]One way to see why $R$ is the square‐root of the Gaussian’s information matrix is to look at how GTSAM’s factorization aligns with the standard least‐squares cost:
- 
    Start with the Gaussian cost (information form). 
 For a multivariate Gaussian in $\mathbb{R}^n$, the negative‐log probability can be written as
 \(\tfrac12\,(x-\mu)^\top\,\Sigma^{-1}\,(x-\mu).\) The matrix $\Sigma^{-1}$ is the information matrix.
- 
    Rewrite it in “square‐root” form. 
 We can factor $\Sigma^{-1}$ as
 \(\Sigma^{-1} \;=\; R^\top\,R,\) where $R$ is an upper‐triangular matrix—this is analogous to a Cholesky (or $\mathrm{QR}$) factor. Then
 \((x-\mu)^\top\,\Sigma^{-1}\,(x-\mu) \;=\;\bigl\|\,R\,(x-\mu)\bigr\|^2.\) Thus we see that (R) acts like the “square‐root” of $\Sigma^{-1}$.
- 
    GTSAM’s final factorization yields $\tfrac12\,|R\,x - d|^2.$ 
 After GTSAM does sequential elimination, it ends up with a reduced system of the form \(\|R\,x - d\|^2,\) which is exactly the same expression as $|(x-\mu)|_{\Sigma^{-1}}^2 = (x-\mu)^\top\,\Sigma^{-1}\,(x-\mu)$ up to a shift and re‐labeling $\mu = R^{-1}\,d$. Expanding $|R\,x - d|^2$ gives \(x^\top\,R^\top\,R\,x \;-\;2\,d^\top\,R\,x \;+\;\ldots\) and we see $R^\top R$ playing the role of the information matrix.
- 
    Dimension 6 is because there are 3 two‐dimensional variables. 
 In the simple Kalman‐smoother example with 3 states $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3$, each state is 2D. Altogether $x\in\mathbb{R}^6$. Hence the final $R$ is $6\times 6$, and $R^\top R$ is the full $6\times 6$ information matrix for that joint Gaussian.
In short, you know that $R$ is the square‐root of the information because the cost function in GTSAM after elimination is exactly $\tfrac12\,|R\,x - d|^2$.  When you expand that norm, the quadratic term is
\(x^\top\,(R^\top R)\,x\)
which must match the original cost’s $\Sigma^{-1}$.  Therefore $R^\top R = \Sigma^{-1}$, making $R$ the “square‐root” information factor.
