<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://jungseokhong.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://jungseokhong.com/" rel="alternate" type="text/html" /><updated>2026-05-04T11:24:36-05:00</updated><id>https://jungseokhong.com/feed.xml</id><title type="html">Jungseok Hong</title><subtitle>Jungseok Hong is a Postdoctoral Associate at MIT CSAIL.</subtitle><author><name>Jungseok Hong</name><email>jungseok@mit.edu</email></author><entry><title type="html">Precision Matrix and Jacobian Matrix Contribution</title><link href="https://jungseokhong.com/math/Ab_matrix/" rel="alternate" type="text/html" title="Precision Matrix and Jacobian Matrix Contribution" /><published>2025-01-17T00:00:00-06:00</published><updated>2025-01-17T00:00:00-06:00</updated><id>https://jungseokhong.com/math/Ab_matrix</id><content type="html" xml:base="https://jungseokhong.com/math/Ab_matrix/"><![CDATA[<h1 id="1-what-is-j">1) What is $J$?</h1>

<p>In GTSAM (and factor-graph methods in general), each “error function” $error(x)$ is often linearized around some operating point $\hat{x}$. For a factor that depends on state $x$, we can write:</p>

\[error(x) \approx error(\hat{x}) + J(x - \hat{x}),\]

<p>where:</p>

<ul>
  <li>$error(\hat{x})$ is the residual at the linearization point,</li>
  <li>$J$ is the Jacobian matrix of partial derivatives of $error$ with respect to $x$, evaluated at $\hat{x}$.</li>
</ul>

<p>In other words, $J$ is literally the matrix of first-order derivatives (slopes) that map small changes in $x$ to changes in the factor’s error. GTSAM’s <code class="language-plaintext highlighter-rouge">JacobianFactor</code> stores this linear approximation $error(\hat{x}) + J(\delta x)$ internally. When multiplied by the factor’s square-root information $\Lambda^{1/2}$, you get the final row(s) in the system $F x = d$.</p>

<hr />

<h1 id="2-numerical-derivation-of-the-square-root-form">2) Numerical Derivation of the Square-Root Form</h1>

<h2 id="a-least-squares-formulation">a) Least-Squares Formulation</h2>

<p>Consider a general Gaussian factor with measurement $z$, predicted by some function $h(x)$. The covariance of the measurement noise is $\Sigma$. The negative log-likelihood of a single factor is:</p>

\[\frac{1}{2} \left( h(x) - z \right)^T \Sigma^{-1} \left( h(x) - z \right).\]

<p>If we define $\Lambda = \Sigma^{-1}$ (the information matrix), then the cost becomes:</p>

\[\frac{1}{2} \left( h(x) - z \right)^T \Lambda \left( h(x) - z \right).\]

<p>Minimizing this is equivalent to a least-squares problem.</p>

<hr />

<h2 id="b-taking-the-square-root-of-lambda">b) Taking the Square-Root of $\Lambda$</h2>

<p>We can rewrite $\Lambda$ as $\Lambda^{1/2} \Lambda^{1/2}$. Let</p>

\[error(x) = h(x) - z.\]

<p>Then the cost is:</p>

\[\frac{1}{2} \left\| \Lambda^{1/2} error(x) \right\|^2.\]

<p>Minimizing $\frac{1}{2} \left| \Lambda^{1/2} error(x) \right|^2$ is exactly the same as minimizing $\frac{1}{2} error(x)^T \Lambda error(x)$. But written as a norm,</p>

\[\Lambda^{1/2} error(x) = 0\]

<p>is the condition for the best fit.</p>

<p>Hence the factor in “square-root” form is simply the row:</p>

\[\Lambda^{1/2} J x - \Lambda^{1/2} \left[ z - h(\hat{x}) + J \hat{x} \right] = 0,\]

<p>once linearized about $\hat{x}$. That is precisely what GTSAM stores in its <code class="language-plaintext highlighter-rouge">JacobianFactor</code>: the matrix part $\Lambda^{1/2} J$ and the vector part $\Lambda^{1/2} \left[ z - h(\hat{x}) + J \hat{x} \right]$.</p>

<hr />

<h2 id="c-simple-1d-numerical-example">c) Simple 1D Numerical Example</h2>

<p>To see it more concretely, imagine a single variable $x \in \mathbb{R}$ with a measurement $z = 5$ and noise variance $\sigma^2 = 4$. Then $\Sigma = 4$, so $\Lambda = \Sigma^{-1} = 1/4$. Its square-root is $\Lambda^{1/2} = 1/2$.</p>

<p>The least-squares cost is:</p>

\[\frac{1}{2} \left( x - 5 \right)^T \cdot \frac{1}{4} \cdot \left( x - 5 \right).\]

<p>Written in square-root form:</p>

\[\frac{1}{2} \left\| \frac{1}{2} \left( x - 5 \right) \right\|^2.\]

<p>The “factor row” is:</p>

\[\left[ \frac{1}{2} \right] (x - 5) = 0\]

<p>or equivalently:</p>

\[\frac{1}{2} x - \frac{5}{2} = 0.\]

<p>Hence in matrix form, that factor is $\left[ \frac{1}{2} \right] x = \left[ \frac{5}{2} \right]$. Minimizing the norm of that row recovers $x = 5$. Notice that the “coefficient” is the square-root precision $\frac{1}{2}$, not $\frac{1}{4}$.</p>

<hr />
<h2 id="example-with-gtsam">Example with GTSAM</h2>

<h2 id="1-why-the-square-root-form">1) Why the “Square-Root” Form?</h2>

<ol>
  <li>
    <p><strong>Better Numerical Conditioning</strong><br />
In least-squares problems, one can write factors either in “information” form or “square-root information” form. Using the <em>square-root</em> of the information matrix (i.e., $\Lambda^{1/2}$) leads to smaller condition numbers in practice and tends to be more stable when doing incremental factorization (QR, Cholesky) in a factor-graph solver.</p>
  </li>
  <li>
    <p><strong>Direct Interpretation</strong><br />
Each factor in GTSAM is conceptually<br />
\(\sqrt{\Lambda}\,\bigl(\text{error}(x)\bigr) \;=\;0,\)<br />
where $\Lambda$ is the <em>precision</em> (inverse covariance). Hence the JacobianFactor simply stores $\sqrt{\Lambda}\,J$ as the rows in $F$, and $\sqrt{\Lambda}\,(\text{measured value})$ as the entries in $d$.</p>
  </li>
  <li>
    <p><strong>Consistency with QR</strong><br />
The final solution uses a QR or Cholesky factorization. Keeping everything in “square-root” form means we can factor $F$ directly into an upper-triangular $R$ (plus orthonormal transforms), rather than first having to multiply out $F^\top F$.</p>
  </li>
</ol>

<hr />

<h2 id="2-updated-numbers-for-the-example">2) Updated Numbers for the Example</h2>

<p>In the code:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model2</span> <span class="o">=</span> <span class="n">noiseModel</span><span class="p">.</span><span class="n">Isotropic</span><span class="p">.</span><span class="n">Sigma</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">)</span>  <span class="c1"># for the priors/measurements
</span><span class="n">motion_model</span> <span class="o">=</span> <span class="n">noiseModel</span><span class="p">.</span><span class="n">Diagonal</span><span class="p">.</span><span class="n">Sigmas</span><span class="p">([</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.3</span><span class="p">])</span>  <span class="c1"># for the motions
</span></code></pre></div></div>
<p>the standard deviations are:</p>
<ul>
  <li>$\sigma=0.5$ for each prior/measurement in 2D</li>
  <li>$\sigma_x=0.1$ and $\sigma_y=0.3$ for each motion factor</li>
</ul>

<p>Hence, their <em>square‐root precisions</em> are:</p>
<ul>
  <li>$\sqrt{\Lambda_{\text{meas}}}=1/\sigma=2.$</li>
  <li>$\sqrt{\Lambda_{\text{motion}}}=\begin{matrix}10 &amp; 0 \ 0 &amp; 3.333\ldots\end{matrix}$ (since $1/0.1=10$ and $1/0.3\approx 3.333\ldots$).</li>
</ul>

<h3 id="a-three-2d-states">a) Three 2D States</h3>

<p>Let<br />
\(x \;=\;
\bigl[
x_{1,1},\,x_{1,2},\,x_{2,1},\,x_{2,2},\,x_{3,1},\,x_{3,2}
\bigr]^\top.\)</p>

<h3 id="b-priorsmeasurements">b) Priors/“Measurements”</h3>

<p>From your loop:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">key</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">([</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">x3</span><span class="p">]):</span>
    <span class="n">b</span> <span class="o">=</span> <span class="n">gtsam</span><span class="p">.</span><span class="n">Point2</span><span class="p">(</span><span class="n">i</span><span class="o">*</span><span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
    <span class="c1"># ...
</span></code></pre></div></div>
<p>the “measured” positions are:</p>
<ul>
  <li>$\mathbf{x}_1 \approx (0,0)$</li>
  <li>$\mathbf{x}_2 \approx (2,0)$</li>
  <li>$\mathbf{x}_3 \approx (4,0)$</li>
</ul>

<p>Since the noise model is $\sigma=0.5$, the factor is<br />
$\sqrt{\Lambda}(\mathbf{x}_i - \mathbf{z}_i)=\mathbf{0}$,<br />
with $\sqrt{\Lambda}=2$.</p>

<p>Hence, each prior contributes <strong>two rows</strong>:</p>

<ol>
  <li><strong>Prior on $\mathbf{x}_1\approx(0,0)$</strong></li>
</ol>

\[2\,(x_{1,1}-0)=0,\quad 2\,(x_{1,2}-0)=0 
\quad\Longrightarrow\quad 
\begin{bmatrix}
2 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; 0\\[6pt]
0 &amp; 2 &amp; 0 &amp; 0 &amp; 0 &amp; 0
\end{bmatrix}
\begin{bmatrix}x_{1,1}\\x_{1,2}\\\dots\\x_{3,2}\end{bmatrix}
=
\begin{bmatrix}0\\0\end{bmatrix}.\]

<ol>
  <li><strong>Prior on $\mathbf{x}_2\approx(2,0)$</strong></li>
</ol>

<p>\(2\,(x_{2,1}-2)=0\;\;\to\;d=4,\quad 
2\,(x_{2,2}-0)=0\;\;\to\;d=0,\)
so the matrix rows become
\(\begin{bmatrix}
0 &amp; 0 &amp; 2 &amp; 0 &amp; 0 &amp; 0\\[6pt]
0 &amp; 0 &amp; 0 &amp; 2 &amp; 0 &amp; 0
\end{bmatrix},
\quad
d=\begin{bmatrix}4\\0\end{bmatrix}.\)</p>

<ol>
  <li><strong>Prior on $\mathbf{x}_3\approx(4,0)$</strong></li>
</ol>

<p>\(2\,(x_{3,1}-4)=0\;\;\to d=8,\quad
2\,(x_{3,2}-0)=0\;\;\to d=0,\)
so
\(\begin{bmatrix}
0 &amp; 0 &amp; 0 &amp; 0 &amp; 2 &amp; 0\\[6pt]
0 &amp; 0 &amp; 0 &amp; 0 &amp; 0 &amp; 2
\end{bmatrix},
\quad
d=\begin{bmatrix}8\\0\end{bmatrix}.\)</p>

<h3 id="c-motion-factors">c) Motion Factors</h3>

<p>You have two motion factors:</p>

<ol>
  <li><strong>$\mathbf{x}_2-\mathbf{x}_1\approx(2,0)$</strong>
with $\sigma_x=0.1,\,\sigma_y=0.3,$
so $\sqrt{\Lambda}=\mathrm{diag}(10,3.333\ldots)$.
    <ul>
      <li>For the (x)‐row:<br />
\(10\,[(x_{2,1}-x_{1,1}) - 2]=0 
\;\Rightarrow\; \text{Row}=[-10,\,0,\,+10,\,0,\,0,\,0],\;d=10\cdot 2=20.\)</li>
      <li>For the (y)‐row:<br />
\(3.333\ldots\,[(x_{2,2}-x_{1,2}) - 0]=0 
\;\Rightarrow\;\text{Row}=[0,\,-3.333,\;0,\,+3.333,\;0,\;0],\; d=0.\)</li>
    </ul>
  </li>
  <li><strong>$\mathbf{x}_3-\mathbf{x}_2\approx(2,0)$</strong>
with the same diagonal sqrt‐info (\mathrm{diag}(10,3.333)).
    <ul>
      <li>$x$-dimension:<br />
\([-10,\,0,\,+10,\,0]\;\to\;[-10,0,+10,0] \text{ but for }(x_2,x_3)
\Rightarrow [0,0,-10,0,\,+10,0],\quad d=20.\)</li>
      <li>$y$-dimension:<br />
\([0,\,-3.333,\,+3.333]\;\Rightarrow [0,0,0,-3.333,0,+3.333],\; d=0.\)</li>
    </ul>
  </li>
</ol>

<p>Putting it all together yields a $10\times 6$ system $F\,x=d$ whose rows come directly from $\sqrt{\Lambda}\times(\mathbf{x}-\mathbf{z})=0$.  That is the <em>square‐root</em> form.</p>

<hr />

<h2 id="3-how-fxd-turns-into-rxd">3) How $F\,x=d$ Turns into $R\,x=d$</h2>

<p>When you call</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">gbn</span> <span class="o">=</span> <span class="n">gfg</span><span class="p">.</span><span class="n">eliminateSequential</span><span class="p">()</span>
<span class="n">R</span><span class="p">,</span> <span class="n">d</span> <span class="o">=</span> <span class="n">gbn</span><span class="p">.</span><span class="n">matrix</span><span class="p">()</span>
</code></pre></div></div>
<p>GTSAM runs a <strong>sequential elimination</strong> procedure (mathematically akin to QR).  In effect, it pivots out the states $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3$ one by one, applying orthonormal transformations to keep the system upper‐triangular.  The final result is an $6\times 6$ matrix $R$ such that
\(R\,x = d.\)
The off‐diagonal entries in $R$ (things like $-9.81$, etc.) are simply the numerical byproducts of these orthonormal transformations.</p>

<p>Thus:</p>

<ol>
  <li><strong>Each factor</strong> (prior or motion) supplies a row (or two rows) of the “square‐root” system $F$.</li>
  <li><strong>Stack</strong> them to get $F\,x=d$.</li>
  <li><strong>Factor</strong> $F\to R$ by sequential elimination.</li>
</ol>

<hr />

<p>The $d$ above is the “square root information vector”, as the Bayes net defines the following quadratic, and corresponding Gaussian density, in this case in $\mathbb{R}^6$:</p>

\[E_{gbn}(x; R, d) \doteq \frac{1}{2} \|Rx - d\|^2\]

\[\mathcal{N}_{gbn}(x; R, d) \sim k \exp \{- \frac{1}{2} \|Rx - d\|^2\}\]

<p>One way to see why $R$ is the square‐root of the Gaussian’s information matrix is to look at how GTSAM’s factorization aligns with the standard least‐squares cost:</p>

<ol>
  <li>
    <p><strong>Start with the Gaussian cost (information form).</strong><br />
For a multivariate Gaussian in $\mathbb{R}^n$, the negative‐log probability can be written as<br />
\(\tfrac12\,(x-\mu)^\top\,\Sigma^{-1}\,(x-\mu).\)
The matrix $\Sigma^{-1}$ is the information matrix.</p>
  </li>
  <li>
    <p><strong>Rewrite it in “square‐root” form.</strong><br />
We can factor $\Sigma^{-1}$ as<br />
\(\Sigma^{-1} \;=\; R^\top\,R,\)
where $R$ is an upper‐triangular matrix—this is analogous to a Cholesky (or $\mathrm{QR}$) factor.  Then<br />
\((x-\mu)^\top\,\Sigma^{-1}\,(x-\mu)
\;=\;\bigl\|\,R\,(x-\mu)\bigr\|^2.\)
Thus we see that (R) acts like the “square‐root” of $\Sigma^{-1}$.</p>
  </li>
  <li>
    <p><strong>GTSAM’s final factorization yields $\tfrac12\,|R\,x - d|^2.$</strong><br />
After GTSAM does sequential elimination, it ends up with a reduced system of the form
\(\|R\,x - d\|^2,\)
which is exactly the same expression as
$|(x-\mu)|_{\Sigma^{-1}}^2 = (x-\mu)^\top\,\Sigma^{-1}\,(x-\mu)$
up to a shift and re‐labeling $\mu = R^{-1}\,d$.  Expanding $|R\,x - d|^2$ gives 
\(x^\top\,R^\top\,R\,x \;-\;2\,d^\top\,R\,x \;+\;\ldots\)
and we see $R^\top R$ playing the role of the information matrix.</p>
  </li>
  <li>
    <p><strong>Dimension 6 is because there are 3 two‐dimensional variables.</strong><br />
In the simple Kalman‐smoother example with 3 states $\mathbf{x}_1,\mathbf{x}_2,\mathbf{x}_3$, each state is 2D.  Altogether $x\in\mathbb{R}^6$.  Hence the final $R$ is $6\times 6$, and $R^\top R$ is the full $6\times 6$ information matrix for that joint Gaussian.</p>
  </li>
</ol>

<p>In short, <strong>you know that $R$ is the square‐root of the information</strong> because the cost function in GTSAM after elimination is exactly $\tfrac12\,|R\,x - d|^2$.  When you expand that norm, the quadratic term is<br />
\(x^\top\,(R^\top R)\,x\)
which must match the original cost’s $\Sigma^{-1}$.  Therefore $R^\top R = \Sigma^{-1}$, making $R$ the “square‐root” information factor.</p>]]></content><author><name>Jungseok Hong</name><email>jungseok@mit.edu</email></author><category term="Math" /><category term="resources" /><summary type="html"><![CDATA[1) What is $J$?]]></summary></entry><entry><title type="html">Resources for Writing a Research Statement</title><link href="https://jungseokhong.com/writing/research_statement/" rel="alternate" type="text/html" title="Resources for Writing a Research Statement" /><published>2023-02-12T00:00:00-06:00</published><updated>2023-02-12T00:00:00-06:00</updated><id>https://jungseokhong.com/writing/research_statement</id><content type="html" xml:base="https://jungseokhong.com/writing/research_statement/"><![CDATA[<p>I’d like to share two resources for writing research statement that I recently got to know.</p>
<ol>
  <li><a href="https://h2r.cs.brown.edu/writing-a-research-statement-for-graduate-school-and-fellowships/">link1</a></li>
  <li><a href="https://amytabb.com/tips/2021/01/18/statement-of-purpose-help/">link2</a></li>
</ol>]]></content><author><name>Jungseok Hong</name><email>jungseok@mit.edu</email></author><category term="Writing" /><category term="resources" /><summary type="html"><![CDATA[I’d like to share two resources for writing research statement that I recently got to know. link1 link2]]></summary></entry><entry><title type="html">Deep learning machine build (1)</title><link href="https://jungseokhong.com/hardware/deep-learning-build-1/" rel="alternate" type="text/html" title="Deep learning machine build (1)" /><published>2021-02-17T00:00:00-06:00</published><updated>2021-02-17T00:00:00-06:00</updated><id>https://jungseokhong.com/hardware/deep-learning-build-1</id><content type="html" xml:base="https://jungseokhong.com/hardware/deep-learning-build-1/"><![CDATA[<p>I will share some of my thoughts for building a deep learning machine in upcoming posts for students. (not for server or lab machines)</p>

<p>In this post, I will briefly compare 3090 and 3080, most recent gpus from Nvidia. I personally prefer 3090 to 3080 despite its price due to 3090’s VRAM size. The size allows to explore a wide range of deep learning models as well as making some tweaks to them.</p>]]></content><author><name>Jungseok Hong</name><email>jungseok@mit.edu</email></author><category term="Hardware" /><category term="tips" /><summary type="html"><![CDATA[I will share some of my thoughts for building a deep learning machine in upcoming posts for students. (not for server or lab machines)]]></summary></entry><entry><title type="html">Website updates</title><link href="https://jungseokhong.com/news/updates/" rel="alternate" type="text/html" title="Website updates" /><published>2021-01-12T00:00:00-06:00</published><updated>2021-01-12T00:00:00-06:00</updated><id>https://jungseokhong.com/news/updates</id><content type="html" xml:base="https://jungseokhong.com/news/updates/"><![CDATA[<ol>
  <li>
    <p>I updated the domain of my website. From now on, you can access via <a href="https://www.jungseokhong.com">jungseokhong.com</a>.</p>
  </li>
  <li>
    <p>I am still updating my publications/research page. All the materials are there but some links are broken.</p>
  </li>
</ol>]]></content><author><name>Jungseok Hong</name><email>jungseok@mit.edu</email></author><category term="News" /><category term="updates" /><summary type="html"><![CDATA[I updated the domain of my website. From now on, you can access via jungseokhong.com.]]></summary></entry><entry><title type="html">Hello World!</title><link href="https://jungseokhong.com/intro/hello-world/" rel="alternate" type="text/html" title="Hello World!" /><published>2021-01-07T00:00:00-06:00</published><updated>2021-01-07T00:00:00-06:00</updated><id>https://jungseokhong.com/intro/hello-world</id><content type="html" xml:base="https://jungseokhong.com/intro/hello-world/"><![CDATA[<p>This is my first post in 2021. Hello world!!</p>]]></content><author><name>Jungseok Hong</name><email>jungseok@mit.edu</email></author><category term="Intro" /><category term="Etc" /><summary type="html"><![CDATA[This is my first post in 2021. Hello world!!]]></summary></entry></feed>