<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://tulon.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://tulon.github.io/" rel="alternate" type="text/html" /><updated>2025-12-13T15:49:19+00:00</updated><id>https://tulon.github.io/feed.xml</id><title type="html">Complex Problems, Elegant Solutions</title><subtitle>I specialize on complex problems in Software Engineering. It&apos;s not that I enjoy complexity - it&apos;s just that I can handle it.</subtitle><author><name>Joseph Artsimovich</name></author><entry><title type="html">The Method of Least Squares for Software Developers</title><link href="https://tulon.github.io/the-method-of-least-squares-for-software-developers/" rel="alternate" type="text/html" title="The Method of Least Squares for Software Developers" /><published>2025-08-04T00:00:00+00:00</published><updated>2025-08-04T00:00:00+00:00</updated><id>https://tulon.github.io/The-Method-of-Least-Squares-for-Software-Decelopers</id><content type="html" xml:base="https://tulon.github.io/the-method-of-least-squares-for-software-developers/"><![CDATA[<ul id="markdown-toc">
  <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li>
  <li><a href="#the-method-of-least-squares" id="markdown-toc-the-method-of-least-squares">The Method of Least Squares</a></li>
  <li><a href="#show-me-the-code" id="markdown-toc-show-me-the-code">Show Me The Code!</a></li>
  <li><a href="#deriving-the-normal-equation" id="markdown-toc-deriving-the-normal-equation">Deriving the Normal Equation</a></li>
  <li><a href="#finding-the-closest-points-on-two-lines" id="markdown-toc-finding-the-closest-points-on-two-lines">Finding the Closest Points on Two Lines</a></li>
  <li><a href="#closing-words" id="markdown-toc-closing-words">Closing Words</a></li>
</ul>

<h2 id="introduction">Introduction</h2>

<p>I am a seasoned C++ developer and in many of my jobs I also interviewed job candidates. I normally interview for C++ but when the role involves graphics, I also interview for Linear Algebra. One of the Linear Algebra problems I give to candidates is the following one:</p>

<blockquote>
  <p>You’ve got a plane defined by 3 arbitrary (not colinear) points on that plane. Let’s call them $ \mathbf{P_1} $, $ \mathbf{P_2} $ and $ \mathbf{P_3} $. You’ve also got a 3rd point $ \mathbf{Q} $ that’s generally not on the plane. Your goal is to find the point $ \mathbf{Q’} $ on that plane that’s closest (in the Euclidean sense) to point $ \mathbf{Q} $. In other words, I am asking you to drop a perpendicular from $ \mathbf{Q} $ to the plane and find the intersection point of that perpendicular with the plane.</p>
</blockquote>

<p>What I like about this problem is that there are many ways to solve it - I know at least 4. About every other candidate manages to solve it, yet over the years, no candidate has ever used my favourite approach. That’s a shame, because that approach works for projecting to any-dimensional linear surface in any-dimensional space. So, you can use it to project not just to planes but also to lines in both 2D and 3D.</p>

<p>Given that the method in question seems to be little known among the software developers (yet very well known by non-software engineers), I though it would be a good idea to write a post about it.</p>

<h2 id="the-method-of-least-squares">The Method of Least Squares</h2>

<p>Or alternatively, you may know it as <em>Projecting Onto a Subspace</em>. If you haven’t heard of either of those names, yet you know some basic Linear Algebra, this article is for you.</p>

<p>So, what does the method of least squares do? Consider an overdetermined linear system $ \mathbf{Ax} = \mathbf{b} $. By overdetermined, I mean the matrix $ \mathbf{A} $ is tall - it has more rows than columns. Generally, such linear systems don’t have exact solutions. The best we can do is to find an approximate solution that minimizes the error. That error is called the residual, and that residual is equal to $ \mathbf{b} - \mathbf{Ax} $. The residual is a vector, so what do we mean by minimizing it? Well, we minimize its Euclidean norm:</p>

\[argmin_{\mathbf{x}} \| \mathbf{b} - \mathbf{Ax} \|\]

<p>Given that the Euclidean norm is non-negative, minimizing it is equivalent to minimizing its square:</p>

\[argmin_{\mathbf{x}} \| \mathbf{b} - \mathbf{Ax} \|^2\]

<p>Minimizing the squared norm is much easier from the Calculus point of view - the derivatives will be much nicer. So, we’ll be minimizing the squared norm of the residual. Hence the <em>Method of Least Squares</em> name.</p>

<p>One other interpretation of this method is the following: it finds a point (techically a vector) in the column space of matrix $ \mathbf{A} $ that’s closest in the Euclidean sense to point $ \mathbf{b} $. What’s a column space of a matrix? It’s a set of all possible linear combinations of columns of that matrix. In general, a set of all possible linear combinations of a given set of vectors is called a <em>vector space</em>.</p>

<p>How does a column space of a 3x2 (3 rows, 2 columns) matrix looks like? Well, it looks like a plane in 3D (provided the columns are linearly independent). Isn’t that what we are after? We are trying to find a point on a plane that’s closest in the Euclidean sense to a given point. Sounds like exactly what we want!</p>

<p>There is one tiny complication though: not every plane in 3D is a vector space, but only those that pass through the origin. Why is that? Well, a linear combination with all coefficients set to zero will produce a zero vector, so the zero vector (the origin) has to be a part of any vector space. This complication is easy to overcome though: we’ll just shift our coordinate system to make one of the points on the plane (say $ \mathbf{P_1} $) to be our new origin. Then we solve the problem in that shifted coordinate system and then shift the result back into the original coordinate system.</p>

<p>Applying the method of least squares is easy: you just take the original linear system $ \mathbf{Ax} = \mathbf{b} $ and left-multiply both sides by $ \mathbf{A}^\top $. That gives you the so called <em>Normal Equation</em>:</p>

\[\mathbf{A}^\top \mathbf{Ax} = \mathbf{A}^\top \mathbf{b}\]

<p>When applied to our problem, we have:</p>

\[\begin{align}
\mathbf{A} &amp;= \left[
\begin{array}{c|c}
\mathbf{P_2} - \mathbf{P_1} &amp; \mathbf{P_3} - \mathbf{P_1}
\end{array} \notag \\
\right] \\
\mathbf{b} &amp;= \mathbf{Q} - \mathbf{P_1} \notag
\end{align}\]

<p>Subtracting $ \mathbf{P_1} $ just shifts the coordinate system so that point $ \mathbf{P_1} $ is the new origin and that origin is on our plane.</p>

<p>Then we solve the normal equation for $ \mathbf{x} $. Once we’ve got $ \mathbf{x} $, we can get the solution in the original coordinate space by evaluating $ \mathbf{Ax} + \mathbf{P_1} $. Adding $ \mathbf{P_1} $ shifts us back into the original coordinate system.</p>

<p>Putting everything together, the solution to our original problem is going to be:</p>

\[\begin{align}
\mathbf{A} &amp;= \left[
\begin{array}{c|c}
\mathbf{P_2} - \mathbf{P_1} &amp; \mathbf{P_3} - \mathbf{P_1}
\end{array}
\right] \notag \\
\mathbf{b} &amp;= \mathbf{Q} - \mathbf{P_1} \notag \\
\mathbf{x} &amp;= (\mathbf{A}^\top \mathbf{A})^{-1}\mathbf{A}^\top\mathbf{b} \notag \\
\mathbf{Q'} &amp;= \mathbf{Ax} + \mathbf{P_1} \notag
\end{align}\]

<p>When projecting onto a plane, $ \mathbf{A}^\top\mathbf{A} $ is going to be a 2x2 matrix. When projecting onto a line (whether in 3D or in 2D), it would be a 1x1 matrix, which makes inverting it really easy - just take the reciprocal of its only element.</p>

<h2 id="show-me-the-code">Show Me The Code!</h2>

<p>Here is a function to project a point onto a plane in C++ using the <a href="https://github.com/g-truc/glm">glm</a> library:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="nf">projectToPlane</span><span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="n">p1</span><span class="p">,</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="n">p2</span><span class="p">,</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="n">p3</span><span class="p">,</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="n">q</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Note that in glm, a 2x3 matrix means a 2-column, 3-row, matrix.</span>
    <span class="c1">// In Linear Algebra, we would have called such a matrix a 3x2 one.</span>
    <span class="k">const</span> <span class="n">glm</span><span class="o">::</span><span class="n">mat2x3</span> <span class="n">A</span><span class="p">(</span><span class="n">p2</span> <span class="o">-</span> <span class="n">p1</span><span class="p">,</span> <span class="n">p3</span> <span class="o">-</span> <span class="n">p1</span><span class="p">);</span>

    <span class="k">const</span> <span class="k">auto</span> <span class="n">b</span> <span class="o">=</span> <span class="n">q</span> <span class="o">-</span> <span class="n">p1</span><span class="p">;</span>
    <span class="k">const</span> <span class="k">auto</span> <span class="n">At</span><span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">transpose</span><span class="p">(</span><span class="n">A</span><span class="p">));</span>
    <span class="k">const</span> <span class="k">auto</span> <span class="n">invAtA</span><span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">inverse</span><span class="p">(</span><span class="n">At</span> <span class="o">*</span> <span class="n">A</span><span class="p">));</span>
    <span class="k">const</span> <span class="k">auto</span> <span class="n">x</span> <span class="o">=</span> <span class="n">invAtA</span> <span class="o">*</span> <span class="n">At</span> <span class="o">*</span> <span class="n">b</span><span class="p">;</span>

    <span class="k">return</span> <span class="n">p1</span> <span class="o">+</span> <span class="n">A</span> <span class="o">*</span> <span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>You can now project points onto a plane or onto a line (with minimal modifications to the code above). However, understanding how the normal equation is derived will allow you to solve a wider class of problems, one of which is given towards the end of this article. So, let’s do the derivation!</p>

<h2 id="deriving-the-normal-equation">Deriving the Normal Equation</h2>

<p>So, we want to project something onto a vector space. Let’s recall that a vector space is a set of all possible linear combinations of a given set of vectors. Those vectors are called the basis vectors and in our case they are the columns of matrix $ \mathbf{A} $. Now suppose we also have matrix $ \mathbf{B} $, whose column space is an <em>orthogonal complimentary</em> one with respect to the column space of $ \mathbf{A} $. What does that mean exactly? The <em>orthogonal</em> part means that every column of $ \mathbf{A} $ is orthogonal to every column of $ \mathbf{B} $. The <em>complimentary</em> part means that when columns of $ \mathbf{A} $ and $ \mathbf{B} $ are put together, the vector space they produce covers the whole space (in this case, the whole 3D space). That is, every point in 3D shall be representable as a linear combination of columns of $ \mathbf{A} $ and $ \mathbf{B} $.</p>

<p>When projecting to a plane in 3D, we can actually build that matrix $ \mathbf{B} $ pretty easily. We just put the cross product of $ (\mathbf{P_2} - \mathbf{P_1}) $ and $ (\mathbf{P_3} - \mathbf{P_1}) $ as its only column. When projecting to a line in 3D though, it’s no longer that simple, as $ \mathbf{B} $ will now have two columns. The good news is that we won’t have to build that matrix $ \mathbf{B} $ at all - we’ll get it cancelled instead. But for now, let’s assume that we have it.</p>

<p>Now, we can represent any vector $ \mathbf{b} $ as a linear combination of columns of $ \mathbf{A} $ and $ \mathbf{B} $:</p>

\[\mathbf{Ax} + \mathbf{By} = \mathbf{b} \tag{1}\]

<p>We could solve the above equation by introducing a matrix $ \mathbf{M} $ and a vector $ \mathbf{z} $, where:</p>

\[\begin{align}
\mathbf{M} &amp;= \left[
\begin{array}{c|c}
\mathbf{A} &amp; \mathbf{B}
\end{array}
\right] \notag \\
\mathbf{z} &amp;=
\begin{bmatrix}
\mathbf{x} \\ \mathbf{y}
\end{bmatrix} \notag \\
\end{align}\]

<p>Then, solving $ \mathbf{Mz} = \mathbf{b} $ for $ \mathbf{z} $ gives us $ \mathbf{x} $ and $ \mathbf{y} $, the latter of which we discard.</p>

<p>However, we are going to do better than that - we are going to left-multiply both sides of (1) by something that will cancel out the $ \mathbf{By} $ term. That something happens to be $ \mathbf{A}^\top $.</p>

<p>That gives us the following equation:</p>

\[\mathbf{A}^\top \mathbf{Ax} + \mathbf{A}^\top \mathbf{By} = \mathbf{A}^\top \mathbf{b}\]

<p>I claim that $ \mathbf{A}^\top \mathbf{B} $ is a matrix of zeros and thus $ \mathbf{A^\top By} $ is a vector of zeros for any $ \mathbf{y} $.</p>

<p>What are the elements of $ \mathbf{A}^\top \mathbf{B} $? They are dot products between a column of $ \mathbf{B} $ and a row of $ \mathbf{A}^\top $ (that is a column of $ \mathbf{A} $). Recall that the column spaces of $ \mathbf{A} $ and $ \mathbf{B} $ are orthogonal - so, every column of $ \mathbf{A} $ is orthogonal to every column of $ \mathbf{B} $. Dot products of orthogonal vectors are zeros, so $ \mathbf{A}^\top \mathbf{B} $ is indeed a matrix of zeros. Eliminating $ \mathbf{A}^\top \mathbf{By} $ gives us the well known <em>Normal Equation</em>:</p>

\[\mathbf{A}^\top \mathbf{Ax} = \mathbf{A}^\top \mathbf{b}\]

<p>Here, $ \mathbf{A}^\top \mathbf{A} $ is a square matrix of the same rank as $ \mathbf{A} $, so this system can easily be solved.</p>

<p>The solution $ \mathbf{x} $ gives us the linear coefficients for the vector in the column space of $ \mathbf{A} $ that’s closest (in the Euclidean sense) to the vector $ \mathbf{b} $.</p>

<p>This completes the derivation.</p>

<h2 id="finding-the-closest-points-on-two-lines">Finding the Closest Points on Two Lines</h2>

<p>Now, let’s apply our new knowledge to solve a different problem:</p>

<blockquote>
  <p>You have two lines in 3D. Let’s call them $ \mathbf{L_1} $ and $ \mathbf{L_2} $. Each line is defined by a point on the line and a vector of arbitrary (nonzero) length, defining the line’s direction. So, we’ve got points $ \mathbf{p_1} $ and $ \mathbf{p_2} $ and vectors $ \mathbf{v_1} $ and $ \mathbf{v_2} $. Your task is to find a point on each line (let’s call them $ \mathbf{k_1} $ and $ \mathbf{k_2}) $ such that the distance between those two points is minimised.</p>
</blockquote>

<p>Let’s try to solve that problem with Linear Algebra. The key to the solution is to realize that the line connecting $ \mathbf{k_1} $ and $ \mathbf{k_2} $ is going to be orthogonal to both $ \mathbf{L_1} $ and $ \mathbf{L_2} $. Let’s give the vector from $ \mathbf{k_1} $ to $ \mathbf{k_2} $ a name. Let’s call it $ \mathbf{u} $. Now, we can write the following equation:</p>

\[\mathbf{p_1} + x_1 \mathbf{v_1} + \mathbf{u} = \mathbf{p_2} + x_2 \mathbf{v_2}\]

<p>There, $ x_1 $ and $ x_2 $ are arbitrary scalars that move us from $ \mathbf{p_i} $ along $ \mathbf{v_i} $.</p>

<p>Now, let’s introduce some auxiliary values:</p>

\[\begin{align}
\mathbf{A} &amp;= \left[
\begin{array}{c|c}
\mathbf{v_1} &amp; -\mathbf{v_2}
\end{array}
\right] \notag \\
\mathbf{b} &amp;= \mathbf{p_2} - \mathbf{p_1} \notag \\
\mathbf{x} &amp;=
\begin{bmatrix}
x_1 \\ x_2
\end{bmatrix} \notag \\
\end{align}\]

<p>Those values allow us to rewrite the equation above in a matrix form:
\(\mathbf{Ax} + \mathbf{u} = \mathbf{b}\)</p>

<p>Now, let’s use the same trick we did before to get rid of $ \mathbf{u} $:</p>

\[\mathbf{A}^\top \mathbf{Ax} + \mathbf{A}^\top \mathbf{u} = \mathbf{A}^\top \mathbf{b}\]

<p>$ \mathbf{A}^\top \mathbf{u} $ is a vector of zeros because $ \mathbf{u} $ is orthogonal to all columns of $ \mathbf{A} $. That elimination produces the normal equation again:</p>

\[\mathbf{A}^\top \mathbf{Ax} = \mathbf{A}^\top \mathbf{b}\]

<p>So, this problem can also be solved by a method of least squares. Let’s just do it:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">std</span><span class="o">::</span><span class="n">pair</span><span class="o">&lt;</span><span class="n">glm</span><span class="o">::</span><span class="n">vec3</span><span class="p">,</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec3</span><span class="o">&gt;</span> <span class="n">closestPointsOnTwoLines</span><span class="p">(</span>
    <span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="n">p1</span><span class="p">,</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="n">v1</span><span class="p">,</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="n">p2</span><span class="p">,</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec3</span> <span class="n">v2</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// Note that in glm, a 2x3 matrix means a 2-column, 3-row, matrix.</span>
    <span class="c1">// In Linear Algebra, we would have called such a matrix a 3x2 one.</span>
    <span class="k">const</span> <span class="n">glm</span><span class="o">::</span><span class="n">mat2x3</span> <span class="n">A</span><span class="p">(</span><span class="n">v1</span><span class="p">,</span> <span class="o">-</span><span class="n">v2</span><span class="p">);</span>

    <span class="k">const</span> <span class="k">auto</span> <span class="n">b</span> <span class="o">=</span> <span class="n">p2</span> <span class="o">-</span> <span class="n">p1</span><span class="p">;</span>
    <span class="k">const</span> <span class="k">auto</span> <span class="n">At</span><span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">transpose</span><span class="p">(</span><span class="n">A</span><span class="p">));</span>
    <span class="k">const</span> <span class="k">auto</span> <span class="n">invAtA</span><span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">inverse</span><span class="p">(</span><span class="n">At</span> <span class="o">*</span> <span class="n">A</span><span class="p">));</span>
    <span class="k">const</span> <span class="k">auto</span> <span class="n">x</span> <span class="o">=</span> <span class="n">invAtA</span> <span class="o">*</span> <span class="n">At</span> <span class="o">*</span> <span class="n">b</span><span class="p">;</span>

    <span class="k">return</span> <span class="p">{</span> <span class="n">p1</span> <span class="o">+</span> <span class="n">v1</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">p2</span> <span class="o">+</span> <span class="n">v2</span> <span class="o">*</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="p">};</span>
<span class="p">}</span>
</code></pre></div></div>

<p>If the two lines are parallel, the code above is going to fail, because the $ \mathbf{A}^\top \mathbf{A} $ matrix won’t be invertable. In such a case, perhaps you could take the midpoint between $ \mathbf{p_1} $ and $ \mathbf{p_2} $ and project it to both $ \mathbf{L_1} $ and $ \mathbf{L_2} $, again using the method of least squares.</p>

<h2 id="closing-words">Closing Words</h2>

<p>The method of least squares seems to be little known among the software developers and yet is quite useful in both 2D and 3D graphics and beyond. This article should have given you all the knowledge you need to apply it to your projects.</p>

<p>BTW, the <a href="/about/">author</a> may be available for contracting work.</p>]]></content><author><name>Joseph Artsimovich</name></author><summary type="html"><![CDATA[]]></summary></entry><entry><title type="html">How to Implement Spline Fitting</title><link href="https://tulon.github.io/spline-fitting/" rel="alternate" type="text/html" title="How to Implement Spline Fitting" /><published>2025-03-23T00:00:00+00:00</published><updated>2025-03-23T00:00:00+00:00</updated><id>https://tulon.github.io/Spline-Fitting</id><content type="html" xml:base="https://tulon.github.io/spline-fitting/"><![CDATA[<ul id="markdown-toc">
  <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li>
  <li><a href="#splines" id="markdown-toc-splines">Splines</a></li>
  <li><a href="#spline-fitting---first-steps" id="markdown-toc-spline-fitting---first-steps">Spline Fitting - First Steps</a></li>
  <li><a href="#the-optimization-problem" id="markdown-toc-the-optimization-problem">The Optimization Problem</a></li>
  <li><a href="#optimizing-it" id="markdown-toc-optimizing-it">Optimizing It</a></li>
  <li><a href="#show-me-the-code" id="markdown-toc-show-me-the-code">Show Me the Code!</a></li>
  <li><a href="#challenges-and-improvements" id="markdown-toc-challenges-and-improvements">Challenges and Improvements</a>    <ul>
      <li><a href="#avoiding-loops" id="markdown-toc-avoiding-loops">Avoiding Loops</a></li>
      <li><a href="#adding-constraints" id="markdown-toc-adding-constraints">Adding Constraints</a>        <ul>
          <li><a href="#lagrange-multipliers" id="markdown-toc-lagrange-multipliers">Lagrange Multipliers</a></li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="#closing-words" id="markdown-toc-closing-words">Closing Words</a></li>
</ul>

<h1 id="introduction">Introduction</h1>

<p>In this article, I’ll show you how I implemented spline fitting in Scan Tailor, which is a tool for post-processing scanned or photographed pages. I am the original author of that project and I was involved with it between 2007 and 2016.</p>

<p>Spline fitting is used there in the dewarping workflow (introduced around 2011), where a curved page is flattened. The picture below should give you the idea:</p>

<div style="text-align:center"><img alt="Dewarping grid" src="/assets/posts/spline-fitting/dewarping-grid.jpg" /></div>

<p>The better the grid follows the curvature of text lines, the better dewarping results you are going to get.</p>

<p>You can let Scan Tailor build the grid by itself or you can define it manually. You can also manually adjust the grid built by Scan Tailor. That’s where spline fitting comes into play. The whole grid is defined by two horizontal curves: the top and the bottom one. They don’t have to be the top-most and the bottom-most text lines, but accuracy is better when they are far apart. The other curves are automatically derived from those two. When Scan Tailor builds the grid by itself, the curves are represented by polylines. The polylines are produced by a custom image processing algorithm and look like this:</p>

<div style="text-align:center"><img alt="Polylines" src="/assets/posts/spline-fitting/polylines.jpg" /></div>

<p>That screenshot above was taken from Scan Tailor’s debugging output (Menu -&gt; Tools -&gt; Debug Mode).</p>

<p>Now suppose the dewarping grid built for you by Scan Tailor is imperfect and you want to adjust it manually. How are you going to adjust the polylines? Certainly not point-by-point!</p>

<h1 id="splines">Splines</h1>

<p>The standard way to manually adjust a curve is to represent that curve as a spline. A spline is a function of this form:</p>

\[\mathbf{s}(t; \mathbf{X})\]

<p>It takes a time / progress argument $ t $ and is parametrized by a set of control points $ \mathbf{X} $. Some types of splines may have additional parameters. For convenience we make  $ \mathbf{X} $ to be a matrix whose columns are the spline’s control points. The spline function returns a vector (a 2D one in our case), which is a position on the screen.</p>

<p>I like to think of a spline function as of a <code class="language-plaintext highlighter-rouge">trajectory of a particle</code> function. It takes time and produces the position of the particle at that time. Scan Tailor uses <a href="https://scholar.google.com/scholar?q=X-splines%3A+A+spline+model+designed+for+the+end-user">X-splines</a>, whose parameter $ t $ goes from 0 to 1. Those splines are parametrized by a sequence of control points and a tension parameter for each control point. On the picture below, the bottom curve is a spline with red dots being its control points. The tension parameters are not mentioned in the formula above, because we simply hardcode them. Keep reading for more details.</p>

<div style="text-align:center"><img alt="Spline control points" src="/assets/posts/spline-fitting/spline-control-points.jpg" /></div>

<p>The red dots (barely visible) are the control points. Their positions define the shape of the spline. When Scan Tailor fits splines to polylines, it uses a fixed number of control points - 5. The user can then add more of them or remove some.</p>

<p>Let’s zoom in a bit:</p>

<div style="text-align:center"><img alt="Spline control points zoomed in" src="/assets/posts/spline-fitting/control-points-zoomed-in.jpg" /></div>

<p>You’ll notice the spline doesn’t actually pass through its control points. That’s because there are two types of splines - the interpolating ones (those do pass through their control points) and the approximating ones (those are merely attracted to their control points). The X-Spline is a hybrid model where the tension parameter specifies whether the spline will pass through a particular control point or merely be attracted to it, and how much. Scan Tailor doesn’t expose the tension parameters to users. It just makes the splines go through the first and the last control points and be attracted to the rest of them. Why not make the spline pass through all control points? Because then more control points would be required to represent the same curve. That’s just my empirical observation.</p>

<h1 id="spline-fitting---first-steps">Spline Fitting - First Steps</h1>

<p>So, how do we fit a spline to a polyline? It’s going to be an iterative process. We start with a spline whose first and last control points correspond to the first and the last points of the polyline and the rest of the control points are placed at equal intervals:</p>

<div style="text-align:center"><img alt="Initial spline positioning" src="/assets/posts/spline-fitting/initial-spline-placement.png" /></div>

<p>The next step is to sample the spline (place some points on the spline at more or less equal intervals) and for each such point, find the closest point to it on the polyline:</p>

<div style="text-align:center"><img alt="Spline and polyline sample pairs" src="/assets/posts/spline-fitting/spline-and-polyline-sample-pairs.png" /></div>

<p>If we were fitting a spline to a point cloud rather than to a polyline (the points in a point cloud are unordered), then we would do it slightly differently: for each point in the point cloud we would find the closest point on the spline (which is a challenging problem by itself). Anyway, what we really want is a set of points on the spline with known $ t $ values and a point of attraction for each of them.</p>

<h1 id="the-optimization-problem">The Optimization Problem</h1>

<p>Now we are ready to formulate our optimization problem. It’s just our initial attempt - we are going to make improvements to it later in the article. However, before I write down the formula, let me explain in simple words what it does.</p>

<p>We are going to find such a matrix $ \mathbf{X} $ (a collection of spline control points) that minimizes the sum of squared lengths of those green lines on the picture above. Why squared lengths? There is more than one reason, but the main one is me trying to keep the objective function quadratic, as those are easy to optimize. BTW, the term <em>optimization</em> in mathematics means just <em>minimization</em> or <em>maximization</em> (in this case, it’s <em>minimization</em>). Oh, and before I present our optimization problem in a mathematical notation, bear in mind I am a software engineer not a mathematician, so don’t judge my math too harshly but do let me know if you spot an error!</p>

<p>OK, here it goes:</p>

\[f(\mathbf{X}) = \sum_{i=1}^{N}{\|\mathbf{s}(t_i, \mathbf{X}) - \mathbf{q}_i\|^2} \tag{1}\]

\[\text{argmin}_{\mathbf{X}} f(\mathbf{X})\]

<p>Where:</p>

<ul>
  <li>$ \mathbf{X} $ is a matrix of spline’s control points.</li>
  <li>$ N $ is the number of samples (the green lines on the picture above).</li>
  <li>$ t_i $ is the spline’s parameter $ t $ for sample $ i $. It comes from the spline sampling process.</li>
  <li>$ \mathbf{s}(t_i, \mathbf{X}) $ is our spline function (with tension parameters treated as constants).</li>
  <li>$ \mathbf{q}_i $ is the attraction point for sample $ i $ (a point on the polyline closest to the given point on the spline).</li>
</ul>

<p>OK, but $ \mathbf{s}(t; \mathbf{X}) $ is a rather complex function in case of X-splines! How do we deal with that? The thing is, it’s only complex when you treat $ t $ as a variable. In our case, $ t_i $’s are constants (and so are tension parameters) and the only real variable is $ \mathbf{X} $. Turns out, in such a setting, it reduces to:</p>

\[\mathbf{s}(\mathbf{X}; t) = \mathbf{X}\mathbf{v}(t) \tag{2}\]

<p>Bearing in mind that columns of $ \mathbf{X} $ are the spline’s control points, the function becomes a linear combination of those control points where $ \mathbf{v}(t) $ is a vector-valued function giving us the coefficients for that linear combination. Because $ t $ is a constant (a bunch of $ t_i $’s are produced by spline sampling we did above), we can pre-compute $ \mathbf{v}(t_i) $ for all samples.</p>

<p>To give you a better understanding, here is how $ \mathbf{v}(t) $ may be represented in C++ code:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">FittableSpline</span>
<span class="p">{</span>
<span class="nl">public:</span>
    <span class="cm">/**
     * Computes the linear coefficients for control points for the given @p t.
     * Any extra parameters (such as tension values) a spline may have are
     * treated as constants. A linear combination of control points with
     * these coeffients produces the point on this spline at @p t.
     */</span>
    <span class="k">virtual</span> <span class="n">Eigen</span><span class="o">::</span><span class="n">VectorXd</span> <span class="n">linearCombinationAt</span><span class="p">(</span><span class="kt">double</span> <span class="n">t</span><span class="p">)</span> <span class="k">const</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
</code></pre></div></div>

<h1 id="optimizing-it">Optimizing It</h1>

<p>With (2) being a linear function of $ \mathbf{X} $ and (1) being a quadratic function of (2), (1) must be a quadratic function of $ \mathbf{X} $. To minimize (1), we just need to compute its derivatives with respect to $ \mathbf{X} $, set them to zero and solve the resulting linear system. Easy? Not exactly. How are we going to take derivatives with respect to a matrix? They don’t teach you that in your undergraduate math courses!</p>

<p>What we are going to do is to bring (1) to the canonical form of a scalar-valued, multivariable quadratic function, that has well known derivatives:</p>

\[\mathbf{x}^\top \mathbf{A} \mathbf{x} + \mathbf{b}^\top \mathbf{x} + c \tag{3}\]

<p>Where $ \mathbf{x} $ is our matrix $ \mathbf{X} $ flattened into a vector and $ \mathbf{A} $, $ \mathbf{b} $ and $ c $ are arbitrary parameters.</p>

<p>Perhaps it’s worth to explain what the first term of (3) expands to:</p>

\[\mathbf{x}^\top \mathbf{A} \mathbf{x} = \sum_{i,j}{\mathbf{x}_i \mathbf{A}_{i,j} \mathbf{x}_j}\]

<p>Basically, we take every element of $ \mathbf{x} $, multiply it by every other element (and also the same element) and by a corresponding coefficient from matrix $ \mathbf{A} $. Then we sum up the resulting terms.</p>

<p>(3) has well-known gradient (the vector of first partial derivatives), which is:</p>

\[(\mathbf{A}^\top + \mathbf{A}) \mathbf{x} + \mathbf{b} \tag{4}\]

<p>So how exactly are we going to bring (1) into the form of (3) in order to minimize it?
It turns out it’s much easier done programmatically than mathematically (for me at least). So, let’s do that in C++. If you spot an error, please tell me.</p>

<h1 id="show-me-the-code">Show Me the Code!</h1>

<p>First, we’ll need a few structs to represent various functions we’ll be dealing with.</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cm">/**
 * Represents a scalar-valued linear function of the form:
 * &lt;pre&gt;
 * f(x) = a^T * x + b
 * &lt;/pre&gt;
 * where a is a vector and b is a scalar.
 */</span>
<span class="k">struct</span> <span class="nc">ScalarLinearFunction</span>
<span class="p">{</span>
    <span class="n">Eigen</span><span class="o">::</span><span class="n">VectorXd</span> <span class="n">a</span><span class="p">;</span>
    <span class="kt">double</span> <span class="n">b</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span>

    <span class="cm">/**
     * Sets this-&gt;a to the correct size and initializes
     * this-&gt;a and this-&gt;b with zeros.
     */</span>
    <span class="k">explicit</span> <span class="n">ScalarLinearFunction</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">numVars</span><span class="p">);</span>

    <span class="kt">size_t</span> <span class="n">numVars</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>

    <span class="n">ScalarLinearFunction</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">+=</span><span class="p">(</span><span class="n">ScalarLinearFunction</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">other</span><span class="p">);</span>
<span class="p">};</span>

<span class="cm">/**
 * Represents a vector-valued linear function of the form:
 * &lt;pre&gt;
 * f(x) = Ax + b
 * &lt;/pre&gt;
 * where A is a matrix and b is a vector.
 */</span>
<span class="k">struct</span> <span class="nc">VectorLinearFunction</span>
<span class="p">{</span>
    <span class="n">Eigen</span><span class="o">::</span><span class="n">MatrixXd</span> <span class="n">A</span><span class="p">;</span>
    <span class="n">Eigen</span><span class="o">::</span><span class="n">VectorXd</span> <span class="n">b</span><span class="p">;</span>
<span class="p">};</span>

<span class="cm">/**
 * Represents a scalar-valued quadratic function of the form:
 * &lt;pre&gt;
 * f(x) = x^T * A * x + b^T * x + c
 * &lt;/pre&gt;
 * where A is a matrix, b is a vector and c is a scalar.
 */</span>
<span class="k">struct</span> <span class="nc">ScalarQuadraticFunction</span>
<span class="p">{</span>
    <span class="n">Eigen</span><span class="o">::</span><span class="n">MatrixXd</span> <span class="n">A</span><span class="p">;</span>
    <span class="n">Eigen</span><span class="o">::</span><span class="n">VectorXd</span> <span class="n">b</span><span class="p">;</span>
    <span class="kt">double</span> <span class="n">c</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">;</span>

    <span class="cm">/**
     * Sets this-&gt;A and this-&gt;b to the correct sizes and initializes
     * this-&gt;A, this-&gt;b and this-&gt;c with zeros.
     */</span>
    <span class="k">explicit</span> <span class="n">QuadraticFunction</span><span class="p">(</span><span class="kt">size_t</span> <span class="n">numVars</span><span class="p">);</span>

    <span class="kt">size_t</span> <span class="n">numVars</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>

    <span class="n">VectorLinearFunction</span> <span class="n">gradient</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>

    <span class="n">ScalarQuadraticFunction</span><span class="o">&amp;</span> <span class="k">operator</span><span class="o">+=</span><span class="p">(</span><span class="n">ScalarQuadraticFunction</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">other</span><span class="p">);</span>
<span class="p">};</span>

<span class="n">ScalarLinearFunction</span> <span class="k">operator</span><span class="o">+</span><span class="p">(</span>
    <span class="n">ScalarLinearFunction</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">lhs</span><span class="p">,</span> <span class="n">ScalarLinearFunction</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">);</span>

<span class="n">ScalarQuadraticFunction</span> <span class="k">operator</span><span class="o">+</span><span class="p">(</span>
    <span class="n">ScalarQuadraticFunction</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">lhs</span><span class="p">,</span> <span class="n">ScalarQuadraticFunction</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">);</span>

<span class="n">ScalarQuadraticFunction</span> <span class="k">operator</span><span class="o">*</span><span class="p">(</span>
    <span class="n">ScalarLinearFunction</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">lhs</span><span class="p">,</span> <span class="n">ScalarLinearFunction</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">rhs</span><span class="p">);</span>
</code></pre></div></div>

<p>Apart from the gradient computation (4), the only other non-trivial operation above is the multiplication of two <em>ScalarLinearFunction</em>’s to produce a <em>ScalarQuadraticFunction</em>. It can be implemented using the following relationship:</p>

\[(\mathbf{a}_1^\top \mathbf{x} + b_1)(\mathbf{a}_2^\top \mathbf{x} + b_2) =
\mathbf{x}^\top (\mathbf{a}_1 \mathbf{a}_2^\top) \mathbf{x} +
(\mathbf{a}_1 b_2 + \mathbf{a}_2 b_1)^\top \mathbf{x} + b_1 b_2\]

<p>Assuming we’ve implemented all the methods mentioned above, we can now transform (1) to (3):</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">SplineSample</span>
<span class="p">{</span>
    <span class="cm">/**
     * Spline parameter t.
     */</span>
    <span class="kt">double</span> <span class="n">t</span><span class="p">;</span>

    <span class="cm">/**
     * The point on the polyline to which spline(t) shall be attracted.
     */</span>
    <span class="nl">Eigen:</span><span class="n">Vector2d</span> <span class="n">attractionPoint</span><span class="p">;</span>
<span class="p">};</span>

<span class="n">ScalarQuadraticFunction</span> <span class="n">buildObjectiveFunction</span><span class="p">(</span>
    <span class="n">FittableSpline</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">spline</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">SplineSample</span><span class="o">&gt;</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">samples</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">size_t</span> <span class="k">const</span> <span class="n">numControlPoints</span> <span class="o">=</span> <span class="n">spline</span><span class="p">.</span><span class="n">numControlPoints</span><span class="p">();</span>

    <span class="c1">// Each control point carries 2 variables: its x and y coordinates.</span>
    <span class="kt">size_t</span> <span class="k">const</span> <span class="n">numVars</span> <span class="o">=</span> <span class="n">numControlPoints</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span>

    <span class="c1">// Initialized with zeros.</span>
    <span class="n">ScalarQuadraticFunction</span> <span class="n">objectiveFunction</span><span class="p">(</span><span class="n">numVars</span><span class="p">);</span>

    <span class="k">for</span> <span class="p">(</span><span class="n">SplineSample</span> <span class="k">const</span><span class="o">&amp;</span> <span class="n">sample</span> <span class="o">:</span> <span class="n">samples</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">Eigen</span><span class="o">::</span><span class="n">VectorXd</span> <span class="k">const</span> <span class="n">controlPointWeights</span> <span class="o">=</span>
            <span class="n">spline</span><span class="p">.</span><span class="n">linearCombinationAt</span><span class="p">(</span><span class="n">sample</span><span class="p">.</span><span class="n">t</span><span class="p">);</span>

        <span class="c1">// We use two separate linear functions to represent the horizontal</span>
        <span class="c1">// and vertical components of `s(t_i, X) - q_i`.</span>
        <span class="n">ScalarLinearFunction</span> <span class="n">deltaX</span><span class="p">(</span><span class="n">numVars</span><span class="p">);</span>  <span class="c1">// Initialized with zeros.</span>
        <span class="n">ScalarLinearFunction</span> <span class="n">deltaY</span><span class="p">(</span><span class="n">numVars</span><span class="p">);</span>  <span class="c1">// Initialized with zeros.</span>

        <span class="n">deltaX</span><span class="p">.</span><span class="n">b</span> <span class="o">=</span> <span class="o">-</span><span class="n">sample</span><span class="p">.</span><span class="n">attractionPoint</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
        <span class="n">deltaY</span><span class="p">.</span><span class="n">b</span> <span class="o">=</span> <span class="o">-</span><span class="n">sample</span><span class="p">.</span><span class="n">attractionPoint</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>

        <span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o">&lt;</span> <span class="n">numVars</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="c1">// Even indices of vector x correspond to the x coordinates</span>
            <span class="c1">// of control points.</span>
            <span class="n">deltaX</span><span class="p">.</span><span class="n">a</span><span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span> <span class="o">=</span> <span class="n">controlPointWeights</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>

            <span class="c1">// Odd indices of vector x correspond to the y coordinates</span>
            <span class="c1">// of control points.</span>
            <span class="n">deltaY</span><span class="p">.</span><span class="n">a</span><span class="p">(</span><span class="n">i</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">=</span> <span class="n">controlPointWeights</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
        <span class="p">}</span>

        <span class="n">objectiveFunction</span> <span class="o">+=</span> <span class="n">deltaX</span> <span class="o">*</span> <span class="n">deltaX</span> <span class="o">+</span> <span class="n">deltaY</span> <span class="o">*</span> <span class="n">deltaY</span><span class="p">;</span>
    <span class="p">}</span>

    <span class="k">return</span> <span class="n">objectiveFunction</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Now, to solve for $ \mathbf{x} $ (the flattened collection of spline control points), we just have to set the gradient to zero and solve the resulting linear system:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ScalarQuadraticFunction</span> <span class="k">const</span> <span class="n">objectiveFunction</span> <span class="o">=</span>
    <span class="n">buildObjectiveFunction</span><span class="p">(</span><span class="n">spline</span><span class="p">,</span> <span class="n">samplesOfSpline</span><span class="p">);</span>

<span class="n">VectorLinearFunction</span> <span class="k">const</span> <span class="n">gradient</span> <span class="o">=</span> <span class="n">objectiveFunction</span><span class="p">.</span><span class="n">gradient</span><span class="p">();</span>

<span class="k">auto</span> <span class="n">qr</span> <span class="o">=</span> <span class="n">gradient</span><span class="p">.</span><span class="n">A</span><span class="p">.</span><span class="n">colPivHouseholderQr</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">qr</span><span class="p">.</span><span class="n">isInvertible</span><span class="p">())</span>
<span class="p">{</span>
    <span class="c1">// Handle error.</span>
<span class="p">}</span>

<span class="n">Eigen</span><span class="o">::</span><span class="n">VectorXd</span> <span class="k">const</span> <span class="n">x</span> <span class="o">=</span> <span class="n">qr</span><span class="p">.</span><span class="n">solve</span><span class="p">(</span><span class="o">-</span><span class="n">gradient</span><span class="p">.</span><span class="n">b</span><span class="p">);</span>
</code></pre></div></div>

<p>That gives us the new positions of spline’s control points that minimize the sum of squared lengths of those green lines on the picture. Then, we repeat the whole process starting from spline sampling, and keep repeating it until the average squared length of a green line stops reducing.</p>

<h1 id="challenges-and-improvements">Challenges and Improvements</h1>

<p>Unfortunately, the approach described above doesn’t work that well. You may
well end up with a result like this:</p>

<div style="text-align:center"><img alt="An imperfect fit of a spline" src="/assets/posts/spline-fitting/imperfect-fit.png" /></div>

<p>The root cause of such a poor fit is that our objective function (1) discourages the lateral movement of spline control points, as such a movement would lead to the average length of a green line increasing.</p>

<p>Basically, this:</p>

<div style="text-align:center"><img alt="Spline and polyline sample pairs before lateral movement" src="/assets/posts/spline-fitting/spline-and-polyline-sample-pairs.png" /></div>

<p>would become this:</p>

<div style="text-align:center"><img alt="Spline and polyline sample pairs after lateral movement" src="/assets/posts/spline-fitting/lateral-movement-of-control-points.png" /></div>

<p>As can be seen, the green lines would longer which is something our objective function really fights against!</p>

<p>And yet, we do need more control points where the polyline curves more, in order to represent that curvature. What can we do about it? Well, we could improve the initial positioning of control points, so that more control points are placed in high curvature zones. However, that alone would probably not be enough. A better way is to change our objective function. Our initial objective function (1) penalizes displacements relative to the attraction point equally in all directions. See the isocontours of that penalty function:</p>

<div style="text-align:center"><img alt="Circular penalty function isocontours" src="/assets/posts/spline-fitting/circular-isocontours.png" /></div>

<p>What we need is a penalty function what penalizes the displacements along the spline less than those in the orthogonal direction:</p>

<div style="text-align:center"><img alt="Elliptical penalty function isocontours" src="/assets/posts/spline-fitting/elliptical-isocontours.png" /></div>

<p>Each ellipse represents a constant level of penalty. So, on the same penalty
budget we can now afford a larger displacement along the polyline than in
the orthogonal direction.</p>

<p>How much do we stretch those isocontours? We stretch them more in low curvature areas and less in high curvature ones. This article is getting longer than I would like it to be, so I am not giving the full details here, but you can find them in these two papers (<a href="http://scholar.google.com/scholar?q=A+concept+for+parametric+surface+fitting+which+avoids+the+parametrization+problem">1</a>, <a href="http://scholar.google.com/scholar?q=Fitting+b-spline+curves+to+point+clouds+by+squared+distance+minimization">2</a>). Also take a look at the <a href="https://github.com/scantailor/scantailor/blob/master/math/spfit/SqDistApproximant.h">relevant code</a> in Scan Tailor. What’s important is that the modified objective function is still quadratic, so we optimize it exactly the same way.</p>

<p>Does the above help? It does, but it solves one problem and creates two more:</p>

<ol>
  <li>When trying to fit a spline to a more or less flat polyline,
loops are occasionally created:
<img alt="A spline with a loop" src="/assets/posts/spline-fitting/loops.png" /></li>
  <li>The spline endpoints may end up far from the polyline endpoints.</li>
</ol>

<p>Let’s solve each of the above problems.</p>

<h2 id="avoiding-loops">Avoiding Loops</h2>

<p>The 1st problem is solved by adding another penalty term to (1) that penalizes the sum of squared distances between the adjacent control points, or alternatively, between points on the spline corresponding to adjacent control points. The 1st option is going look like this:</p>

\[\alpha \sum_{i=1}^{K-1}{\|\mathbf{X}_{i+1} - \mathbf{X}_i\|^2}\]

<p>Where:</p>

<ul>
  <li>$ \alpha $ is the importance factor of this penalty term. We start with a higher
value and gradually reduce it in subsequent iterations.</li>
  <li>$ K $ is the number of control points in a spline.</li>
  <li>$ \mathbf{X}_i $ is the $ i $’th column of matrix $ \mathbf{X} $, that is the
$ i $’th control point of the spline.</li>
</ul>

<h2 id="adding-constraints">Adding Constraints</h2>

<p>The 2nd problem is solved by adding constraints to our optimization problem. We need two such constraints: <span>$ \mathbf{X}\mathbf{v}(0) = \mathbf{q}_{first} $</span> and <span>$ \mathbf{X}\mathbf{v}(1) = \mathbf{q}_{last} $</span>.</p>

<p>Where:</p>

<ul>
  <li>$ \mathbf{v}(0) $ and $ \mathbf{v}(1) $ are the coefficients of linear combinations of spline’s control points that produce the start and the end point of the spline respectively.</li>
  <li><span>$ \mathbf{q}_{first} $</span> and <span>$ \mathbf{q}_{last} $</span> are the first and the last points of the target polyline respectively.</li>
</ul>

<h3 id="lagrange-multipliers">Lagrange Multipliers</h3>

<p>Whenever you hear about optimization under equality constraints, <a href="https://en.wikipedia.org/wiki/Lagrange_multiplier">Lagrange Multipliers</a> should come to mind. In fact, this is the perfect use case for them: a quadratic objective function with linear equality constraints is as easy to optimize as a quadratic function without constraints - the linear system just gets a bit larger.</p>

<p>Let me remind you how to use the method of Lagrange multipliers to add constraints to an optimization problem. Suppose, you need to optimize (maximize or minimize) a general, scalar-valued function $ g(\mathbf{x}) $ (where $ \mathbf{x} $ is a vector) subject to $ M $ constraints of the form $ h_i(\mathbf{x}) = c_i $ (where $ c_i $ are constants). The solution is to solve the following system of equations for $ \mathbf{x} $ and $ \mathbf{\lambda} $:</p>

\[\begin{align}
\nabla g(\mathbf{x}) &amp;= \sum_{i=1}^M{\mathbf{\lambda}_i \nabla h_i(\mathbf{x})} \notag \\
h_1(\mathbf{x}) &amp;= c_1 \notag \\
\vdots \notag \\
h_M(\mathbf{x}) &amp;= c_M \notag \\
\end{align}\]

<p>In plain English: you solve for the constraints with an additional condition
that the gradient of the objective function is a linear combination of gradients of constraints. The coeffiecints ($ \mathbf{\lambda}_i $) in that linear combination are called Lagrange multipliers.</p>

<h1 id="closing-words">Closing Words</h1>

<p>That’s it more or less. I had to skip some details in order to keep the article’s length reasonable, but if you still remember some Linear Algebra and some Calculus, it should be quite possible for you to fill those gaps and implement spline fitting for your project. For reference, check out Scan Tailor’s <a href="https://github.com/scantailor/scantailor/blob/master/math/spfit/SqDistApproximant.h">implementation</a>.</p>

<p>BTW, the <a href="/about/">author</a> may be available for contracting work.</p>]]></content><author><name>Joseph Artsimovich</name></author><summary type="html"><![CDATA[]]></summary></entry></feed>