Complex Problems, Elegant Solutions

The Method of Least Squares for Software Developers

2025-08-04T00:00:00+00:00

Introduction
The Method of Least Squares
Show Me The Code!
Deriving the Normal Equation
Finding the Closest Points on Two Lines
Closing Words

Introduction

I am a seasoned C++ developer and in many of my jobs I also interviewed job candidates. I normally interview for C++ but when the role involves graphics, I also interview for Linear Algebra. One of the Linear Algebra problems I give to candidates is the following one:

You’ve got a plane defined by 3 arbitrary (not colinear) points on that plane. Let’s call them $ \mathbf{P_1} $, $ \mathbf{P_2} $ and $ \mathbf{P_3} $. You’ve also got a 3rd point $ \mathbf{Q} $ that’s generally not on the plane. Your goal is to find the point $ \mathbf{Q’} $ on that plane that’s closest (in the Euclidean sense) to point $ \mathbf{Q} $. In other words, I am asking you to drop a perpendicular from $ \mathbf{Q} $ to the plane and find the intersection point of that perpendicular with the plane.

What I like about this problem is that there are many ways to solve it - I know at least 4. About every other candidate manages to solve it, yet over the years, no candidate has ever used my favourite approach. That’s a shame, because that approach works for projecting to any-dimensional linear surface in any-dimensional space. So, you can use it to project not just to planes but also to lines in both 2D and 3D.

Given that the method in question seems to be little known among the software developers (yet very well known by non-software engineers), I though it would be a good idea to write a post about it.

The Method of Least Squares

Or alternatively, you may know it as Projecting Onto a Subspace. If you haven’t heard of either of those names, yet you know some basic Linear Algebra, this article is for you.

So, what does the method of least squares do? Consider an overdetermined linear system $ \mathbf{Ax} = \mathbf{b} $. By overdetermined, I mean the matrix $ \mathbf{A} $ is tall - it has more rows than columns. Generally, such linear systems don’t have exact solutions. The best we can do is to find an approximate solution that minimizes the error. That error is called the residual, and that residual is equal to $ \mathbf{b} - \mathbf{Ax} $. The residual is a vector, so what do we mean by minimizing it? Well, we minimize its Euclidean norm:

\[argmin_{\mathbf{x}} \| \mathbf{b} - \mathbf{Ax} \|\]

Given that the Euclidean norm is non-negative, minimizing it is equivalent to minimizing its square:

\[argmin_{\mathbf{x}} \| \mathbf{b} - \mathbf{Ax} \|^2\]

Minimizing the squared norm is much easier from the Calculus point of view - the derivatives will be much nicer. So, we’ll be minimizing the squared norm of the residual. Hence the Method of Least Squares name.

One other interpretation of this method is the following: it finds a point (techically a vector) in the column space of matrix $ \mathbf{A} $ that’s closest in the Euclidean sense to point $ \mathbf{b} $. What’s a column space of a matrix? It’s a set of all possible linear combinations of columns of that matrix. In general, a set of all possible linear combinations of a given set of vectors is called a vector space.

How does a column space of a 3x2 (3 rows, 2 columns) matrix looks like? Well, it looks like a plane in 3D (provided the columns are linearly independent). Isn’t that what we are after? We are trying to find a point on a plane that’s closest in the Euclidean sense to a given point. Sounds like exactly what we want!

There is one tiny complication though: not every plane in 3D is a vector space, but only those that pass through the origin. Why is that? Well, a linear combination with all coefficients set to zero will produce a zero vector, so the zero vector (the origin) has to be a part of any vector space. This complication is easy to overcome though: we’ll just shift our coordinate system to make one of the points on the plane (say $ \mathbf{P_1} $) to be our new origin. Then we solve the problem in that shifted coordinate system and then shift the result back into the original coordinate system.

Applying the method of least squares is easy: you just take the original linear system $ \mathbf{Ax} = \mathbf{b} $ and left-multiply both sides by $ \mathbf{A}^\top $. That gives you the so called Normal Equation:

\[\mathbf{A}^\top \mathbf{Ax} = \mathbf{A}^\top \mathbf{b}\]

When applied to our problem, we have:

\[\begin{align} \mathbf{A} &= \left[ \begin{array}{c|c} \mathbf{P_2} - \mathbf{P_1} & \mathbf{P_3} - \mathbf{P_1} \end{array} \notag \\ \right] \\ \mathbf{b} &= \mathbf{Q} - \mathbf{P_1} \notag \end{align}\]

Subtracting $ \mathbf{P_1} $ just shifts the coordinate system so that point $ \mathbf{P_1} $ is the new origin and that origin is on our plane.

Then we solve the normal equation for $ \mathbf{x} $. Once we’ve got $ \mathbf{x} $, we can get the solution in the original coordinate space by evaluating $ \mathbf{Ax} + \mathbf{P_1} $. Adding $ \mathbf{P_1} $ shifts us back into the original coordinate system.

Putting everything together, the solution to our original problem is going to be:

\[\begin{align} \mathbf{A} &= \left[ \begin{array}{c|c} \mathbf{P_2} - \mathbf{P_1} & \mathbf{P_3} - \mathbf{P_1} \end{array} \right] \notag \\ \mathbf{b} &= \mathbf{Q} - \mathbf{P_1} \notag \\ \mathbf{x} &= (\mathbf{A}^\top \mathbf{A})^{-1}\mathbf{A}^\top\mathbf{b} \notag \\ \mathbf{Q'} &= \mathbf{Ax} + \mathbf{P_1} \notag \end{align}\]

When projecting onto a plane, $ \mathbf{A}^\top\mathbf{A} $ is going to be a 2x2 matrix. When projecting onto a line (whether in 3D or in 2D), it would be a 1x1 matrix, which makes inverting it really easy - just take the reciprocal of its only element.

Show Me The Code!

Here is a function to project a point onto a plane in C++ using the glm library:

glm::vec3 projectToPlane(glm::vec3 p1, glm::vec3 p2, glm::vec3 p3, glm::vec3 q)
{
    // Note that in glm, a 2x3 matrix means a 2-column, 3-row, matrix.
    // In Linear Algebra, we would have called such a matrix a 3x2 one.
    const glm::mat2x3 A(p2 - p1, p3 - p1);

    const auto b = q - p1;
    const auto At(glm::transpose(A));
    const auto invAtA(glm::inverse(At * A));
    const auto x = invAtA * At * b;

    return p1 + A * x;
}

You can now project points onto a plane or onto a line (with minimal modifications to the code above). However, understanding how the normal equation is derived will allow you to solve a wider class of problems, one of which is given towards the end of this article. So, let’s do the derivation!

Deriving the Normal Equation

So, we want to project something onto a vector space. Let’s recall that a vector space is a set of all possible linear combinations of a given set of vectors. Those vectors are called the basis vectors and in our case they are the columns of matrix $ \mathbf{A} $. Now suppose we also have matrix $ \mathbf{B} $, whose column space is an orthogonal complimentary one with respect to the column space of $ \mathbf{A} $. What does that mean exactly? The orthogonal part means that every column of $ \mathbf{A} $ is orthogonal to every column of $ \mathbf{B} $. The complimentary part means that when columns of $ \mathbf{A} $ and $ \mathbf{B} $ are put together, the vector space they produce covers the whole space (in this case, the whole 3D space). That is, every point in 3D shall be representable as a linear combination of columns of $ \mathbf{A} $ and $ \mathbf{B} $.

When projecting to a plane in 3D, we can actually build that matrix $ \mathbf{B} $ pretty easily. We just put the cross product of $ (\mathbf{P_2} - \mathbf{P_1}) $ and $ (\mathbf{P_3} - \mathbf{P_1}) $ as its only column. When projecting to a line in 3D though, it’s no longer that simple, as $ \mathbf{B} $ will now have two columns. The good news is that we won’t have to build that matrix $ \mathbf{B} $ at all - we’ll get it cancelled instead. But for now, let’s assume that we have it.

Now, we can represent any vector $ \mathbf{b} $ as a linear combination of columns of $ \mathbf{A} $ and $ \mathbf{B} $:

\[\mathbf{Ax} + \mathbf{By} = \mathbf{b} \tag{1}\]

We could solve the above equation by introducing a matrix $ \mathbf{M} $ and a vector $ \mathbf{z} $, where:

\[\begin{align} \mathbf{M} &= \left[ \begin{array}{c|c} \mathbf{A} & \mathbf{B} \end{array} \right] \notag \\ \mathbf{z} &= \begin{bmatrix} \mathbf{x} \\ \mathbf{y} \end{bmatrix} \notag \\ \end{align}\]

Then, solving $ \mathbf{Mz} = \mathbf{b} $ for $ \mathbf{z} $ gives us $ \mathbf{x} $ and $ \mathbf{y} $, the latter of which we discard.

However, we are going to do better than that - we are going to left-multiply both sides of (1) by something that will cancel out the $ \mathbf{By} $ term. That something happens to be $ \mathbf{A}^\top $.

That gives us the following equation:

\[\mathbf{A}^\top \mathbf{Ax} + \mathbf{A}^\top \mathbf{By} = \mathbf{A}^\top \mathbf{b}\]

I claim that $ \mathbf{A}^\top \mathbf{B} $ is a matrix of zeros and thus $ \mathbf{A^\top By} $ is a vector of zeros for any $ \mathbf{y} $.

What are the elements of $ \mathbf{A}^\top \mathbf{B} $? They are dot products between a column of $ \mathbf{B} $ and a row of $ \mathbf{A}^\top $ (that is a column of $ \mathbf{A} $). Recall that the column spaces of $ \mathbf{A} $ and $ \mathbf{B} $ are orthogonal - so, every column of $ \mathbf{A} $ is orthogonal to every column of $ \mathbf{B} $. Dot products of orthogonal vectors are zeros, so $ \mathbf{A}^\top \mathbf{B} $ is indeed a matrix of zeros. Eliminating $ \mathbf{A}^\top \mathbf{By} $ gives us the well known Normal Equation:

\[\mathbf{A}^\top \mathbf{Ax} = \mathbf{A}^\top \mathbf{b}\]

Here, $ \mathbf{A}^\top \mathbf{A} $ is a square matrix of the same rank as $ \mathbf{A} $, so this system can easily be solved.

The solution $ \mathbf{x} $ gives us the linear coefficients for the vector in the column space of $ \mathbf{A} $ that’s closest (in the Euclidean sense) to the vector $ \mathbf{b} $.

This completes the derivation.

Finding the Closest Points on Two Lines

Now, let’s apply our new knowledge to solve a different problem:

You have two lines in 3D. Let’s call them $ \mathbf{L_1} $ and $ \mathbf{L_2} $. Each line is defined by a point on the line and a vector of arbitrary (nonzero) length, defining the line’s direction. So, we’ve got points $ \mathbf{p_1} $ and $ \mathbf{p_2} $ and vectors $ \mathbf{v_1} $ and $ \mathbf{v_2} $. Your task is to find a point on each line (let’s call them $ \mathbf{k_1} $ and $ \mathbf{k_2}) $ such that the distance between those two points is minimised.

Let’s try to solve that problem with Linear Algebra. The key to the solution is to realize that the line connecting $ \mathbf{k_1} $ and $ \mathbf{k_2} $ is going to be orthogonal to both $ \mathbf{L_1} $ and $ \mathbf{L_2} $. Let’s give the vector from $ \mathbf{k_1} $ to $ \mathbf{k_2} $ a name. Let’s call it $ \mathbf{u} $. Now, we can write the following equation:

\[\mathbf{p_1} + x_1 \mathbf{v_1} + \mathbf{u} = \mathbf{p_2} + x_2 \mathbf{v_2}\]

There, $ x_1 $ and $ x_2 $ are arbitrary scalars that move us from $ \mathbf{p_i} $ along $ \mathbf{v_i} $.

Now, let’s introduce some auxiliary values:

\[\begin{align} \mathbf{A} &= \left[ \begin{array}{c|c} \mathbf{v_1} & -\mathbf{v_2} \end{array} \right] \notag \\ \mathbf{b} &= \mathbf{p_2} - \mathbf{p_1} \notag \\ \mathbf{x} &= \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \notag \\ \end{align}\]

Those values allow us to rewrite the equation above in a matrix form: $\mathbf{Ax} + \mathbf{u} = \mathbf{b}$

Now, let’s use the same trick we did before to get rid of $ \mathbf{u} $:

\[\mathbf{A}^\top \mathbf{Ax} + \mathbf{A}^\top \mathbf{u} = \mathbf{A}^\top \mathbf{b}\]

$ \mathbf{A}^\top \mathbf{u} $ is a vector of zeros because $ \mathbf{u} $ is orthogonal to all columns of $ \mathbf{A} $. That elimination produces the normal equation again:

\[\mathbf{A}^\top \mathbf{Ax} = \mathbf{A}^\top \mathbf{b}\]

So, this problem can also be solved by a method of least squares. Let’s just do it:

std::pair<glm::vec3, glm::vec3> closestPointsOnTwoLines(
    glm::vec3 p1, glm::vec3 v1, glm::vec3 p2, glm::vec3 v2)
{
    // Note that in glm, a 2x3 matrix means a 2-column, 3-row, matrix.
    // In Linear Algebra, we would have called such a matrix a 3x2 one.
    const glm::mat2x3 A(v1, -v2);

    const auto b = p2 - p1;
    const auto At(glm::transpose(A));
    const auto invAtA(glm::inverse(At * A));
    const auto x = invAtA * At * b;

    return { p1 + v1 * x[0], p2 + v2 * x[1] };
}

If the two lines are parallel, the code above is going to fail, because the $ \mathbf{A}^\top \mathbf{A} $ matrix won’t be invertable. In such a case, perhaps you could take the midpoint between $ \mathbf{p_1} $ and $ \mathbf{p_2} $ and project it to both $ \mathbf{L_1} $ and $ \mathbf{L_2} $, again using the method of least squares.

Closing Words

The method of least squares seems to be little known among the software developers and yet is quite useful in both 2D and 3D graphics and beyond. This article should have given you all the knowledge you need to apply it to your projects.

BTW, the author may be available for contracting work.

How to Implement Spline Fitting

2025-03-23T00:00:00+00:00

Introduction
Splines
Spline Fitting - First Steps
The Optimization Problem
Optimizing It
Show Me the Code!
Challenges and Improvements
- Avoiding Loops
- Adding Constraints
  - Lagrange Multipliers
Closing Words

Introduction

In this article, I’ll show you how I implemented spline fitting in Scan Tailor, which is a tool for post-processing scanned or photographed pages. I am the original author of that project and I was involved with it between 2007 and 2016.

Spline fitting is used there in the dewarping workflow (introduced around 2011), where a curved page is flattened. The picture below should give you the idea:

The better the grid follows the curvature of text lines, the better dewarping results you are going to get.

You can let Scan Tailor build the grid by itself or you can define it manually. You can also manually adjust the grid built by Scan Tailor. That’s where spline fitting comes into play. The whole grid is defined by two horizontal curves: the top and the bottom one. They don’t have to be the top-most and the bottom-most text lines, but accuracy is better when they are far apart. The other curves are automatically derived from those two. When Scan Tailor builds the grid by itself, the curves are represented by polylines. The polylines are produced by a custom image processing algorithm and look like this:

That screenshot above was taken from Scan Tailor’s debugging output (Menu -> Tools -> Debug Mode).

Now suppose the dewarping grid built for you by Scan Tailor is imperfect and you want to adjust it manually. How are you going to adjust the polylines? Certainly not point-by-point!

Splines

The standard way to manually adjust a curve is to represent that curve as a spline. A spline is a function of this form:

\[\mathbf{s}(t; \mathbf{X})\]

It takes a time / progress argument $ t $ and is parametrized by a set of control points $ \mathbf{X} $. Some types of splines may have additional parameters. For convenience we make $ \mathbf{X} $ to be a matrix whose columns are the spline’s control points. The spline function returns a vector (a 2D one in our case), which is a position on the screen.

I like to think of a spline function as of a trajectory of a particle function. It takes time and produces the position of the particle at that time. Scan Tailor uses X-splines, whose parameter $ t $ goes from 0 to 1. Those splines are parametrized by a sequence of control points and a tension parameter for each control point. On the picture below, the bottom curve is a spline with red dots being its control points. The tension parameters are not mentioned in the formula above, because we simply hardcode them. Keep reading for more details.

The red dots (barely visible) are the control points. Their positions define the shape of the spline. When Scan Tailor fits splines to polylines, it uses a fixed number of control points - 5. The user can then add more of them or remove some.

Let’s zoom in a bit:

You’ll notice the spline doesn’t actually pass through its control points. That’s because there are two types of splines - the interpolating ones (those do pass through their control points) and the approximating ones (those are merely attracted to their control points). The X-Spline is a hybrid model where the tension parameter specifies whether the spline will pass through a particular control point or merely be attracted to it, and how much. Scan Tailor doesn’t expose the tension parameters to users. It just makes the splines go through the first and the last control points and be attracted to the rest of them. Why not make the spline pass through all control points? Because then more control points would be required to represent the same curve. That’s just my empirical observation.

Spline Fitting - First Steps

So, how do we fit a spline to a polyline? It’s going to be an iterative process. We start with a spline whose first and last control points correspond to the first and the last points of the polyline and the rest of the control points are placed at equal intervals:

The next step is to sample the spline (place some points on the spline at more or less equal intervals) and for each such point, find the closest point to it on the polyline:

If we were fitting a spline to a point cloud rather than to a polyline (the points in a point cloud are unordered), then we would do it slightly differently: for each point in the point cloud we would find the closest point on the spline (which is a challenging problem by itself). Anyway, what we really want is a set of points on the spline with known $ t $ values and a point of attraction for each of them.

The Optimization Problem

Now we are ready to formulate our optimization problem. It’s just our initial attempt - we are going to make improvements to it later in the article. However, before I write down the formula, let me explain in simple words what it does.

We are going to find such a matrix $ \mathbf{X} $ (a collection of spline control points) that minimizes the sum of squared lengths of those green lines on the picture above. Why squared lengths? There is more than one reason, but the main one is me trying to keep the objective function quadratic, as those are easy to optimize. BTW, the term optimization in mathematics means just minimization or maximization (in this case, it’s minimization). Oh, and before I present our optimization problem in a mathematical notation, bear in mind I am a software engineer not a mathematician, so don’t judge my math too harshly but do let me know if you spot an error!

OK, here it goes:

\[f(\mathbf{X}) = \sum_{i=1}^{N}{\|\mathbf{s}(t_i, \mathbf{X}) - \mathbf{q}_i\|^2} \tag{1}\] \[\text{argmin}_{\mathbf{X}} f(\mathbf{X})\]

Where:

$ \mathbf{X} $ is a matrix of spline’s control points.
$ N $ is the number of samples (the green lines on the picture above).
$ t_i $ is the spline’s parameter $ t $ for sample $ i $. It comes from the spline sampling process.
$ \mathbf{s}(t_i, \mathbf{X}) $ is our spline function (with tension parameters treated as constants).
$ \mathbf{q}_i $ is the attraction point for sample $ i $ (a point on the polyline closest to the given point on the spline).

OK, but $ \mathbf{s}(t; \mathbf{X}) $ is a rather complex function in case of X-splines! How do we deal with that? The thing is, it’s only complex when you treat $ t $ as a variable. In our case, $ t_i $’s are constants (and so are tension parameters) and the only real variable is $ \mathbf{X} $. Turns out, in such a setting, it reduces to:

\[\mathbf{s}(\mathbf{X}; t) = \mathbf{X}\mathbf{v}(t) \tag{2}\]

Bearing in mind that columns of $ \mathbf{X} $ are the spline’s control points, the function becomes a linear combination of those control points where $ \mathbf{v}(t) $ is a vector-valued function giving us the coefficients for that linear combination. Because $ t $ is a constant (a bunch of $ t_i $’s are produced by spline sampling we did above), we can pre-compute $ \mathbf{v}(t_i) $ for all samples.

To give you a better understanding, here is how $ \mathbf{v}(t) $ may be represented in C++ code:

class FittableSpline
{
public:
    /**
     * Computes the linear coefficients for control points for the given @p t.
     * Any extra parameters (such as tension values) a spline may have are
     * treated as constants. A linear combination of control points with
     * these coeffients produces the point on this spline at @p t.
     */
    virtual Eigen::VectorXd linearCombinationAt(double t) const = 0;

Optimizing It

With (2) being a linear function of $ \mathbf{X} $ and (1) being a quadratic function of (2), (1) must be a quadratic function of $ \mathbf{X} $. To minimize (1), we just need to compute its derivatives with respect to $ \mathbf{X} $, set them to zero and solve the resulting linear system. Easy? Not exactly. How are we going to take derivatives with respect to a matrix? They don’t teach you that in your undergraduate math courses!

What we are going to do is to bring (1) to the canonical form of a scalar-valued, multivariable quadratic function, that has well known derivatives:

\[\mathbf{x}^\top \mathbf{A} \mathbf{x} + \mathbf{b}^\top \mathbf{x} + c \tag{3}\]

Where $ \mathbf{x} $ is our matrix $ \mathbf{X} $ flattened into a vector and $ \mathbf{A} $, $ \mathbf{b} $ and $ c $ are arbitrary parameters.

Perhaps it’s worth to explain what the first term of (3) expands to:

\[\mathbf{x}^\top \mathbf{A} \mathbf{x} = \sum_{i,j}{\mathbf{x}_i \mathbf{A}_{i,j} \mathbf{x}_j}\]

Basically, we take every element of $ \mathbf{x} $, multiply it by every other element (and also the same element) and by a corresponding coefficient from matrix $ \mathbf{A} $. Then we sum up the resulting terms.

(3) has well-known gradient (the vector of first partial derivatives), which is:

\[(\mathbf{A}^\top + \mathbf{A}) \mathbf{x} + \mathbf{b} \tag{4}\]

So how exactly are we going to bring (1) into the form of (3) in order to minimize it? It turns out it’s much easier done programmatically than mathematically (for me at least). So, let’s do that in C++. If you spot an error, please tell me.

Show Me the Code!

First, we’ll need a few structs to represent various functions we’ll be dealing with.

/**
 * Represents a scalar-valued linear function of the form:
 *  * f(x) = a^T * x + b
 * 
 * where a is a vector and b is a scalar.
 */
struct ScalarLinearFunction
{
    Eigen::VectorXd a;
    double b = 0.0;

    /**
     * Sets this->a to the correct size and initializes
     * this->a and this->b with zeros.
     */
    explicit ScalarLinearFunction(size_t numVars);

    size_t numVars() const;

    ScalarLinearFunction& operator+=(ScalarLinearFunction const& other);
};

/**
 * Represents a vector-valued linear function of the form:
 *  * f(x) = Ax + b
 * 
 * where A is a matrix and b is a vector.
 */
struct VectorLinearFunction
{
    Eigen::MatrixXd A;
    Eigen::VectorXd b;
};

/**
 * Represents a scalar-valued quadratic function of the form:
 *  * f(x) = x^T * A * x + b^T * x + c
 * 
 * where A is a matrix, b is a vector and c is a scalar.
 */
struct ScalarQuadraticFunction
{
    Eigen::MatrixXd A;
    Eigen::VectorXd b;
    double c = 0.0;

    /**
     * Sets this->A and this->b to the correct sizes and initializes
     * this->A, this->b and this->c with zeros.
     */
    explicit QuadraticFunction(size_t numVars);

    size_t numVars() const;

    VectorLinearFunction gradient() const;

    ScalarQuadraticFunction& operator+=(ScalarQuadraticFunction const& other);
};

ScalarLinearFunction operator+(
    ScalarLinearFunction const& lhs, ScalarLinearFunction const& rhs);

ScalarQuadraticFunction operator+(
    ScalarQuadraticFunction const& lhs, ScalarQuadraticFunction const& rhs);

ScalarQuadraticFunction operator*(
    ScalarLinearFunction const& lhs, ScalarLinearFunction const& rhs);

Apart from the gradient computation (4), the only other non-trivial operation above is the multiplication of two ScalarLinearFunction’s to produce a ScalarQuadraticFunction. It can be implemented using the following relationship:

\[(\mathbf{a}_1^\top \mathbf{x} + b_1)(\mathbf{a}_2^\top \mathbf{x} + b_2) = \mathbf{x}^\top (\mathbf{a}_1 \mathbf{a}_2^\top) \mathbf{x} + (\mathbf{a}_1 b_2 + \mathbf{a}_2 b_1)^\top \mathbf{x} + b_1 b_2\]

Assuming we’ve implemented all the methods mentioned above, we can now transform (1) to (3):

struct SplineSample
{
    /**
     * Spline parameter t.
     */
    double t;

    /**
     * The point on the polyline to which spline(t) shall be attracted.
     */
    Eigen:Vector2d attractionPoint;
};

ScalarQuadraticFunction buildObjectiveFunction(
    FittableSpline const& spline, std::vector<SplineSample> const& samples)
{
    size_t const numControlPoints = spline.numControlPoints();

    // Each control point carries 2 variables: its x and y coordinates.
    size_t const numVars = numControlPoints * 2;

    // Initialized with zeros.
    ScalarQuadraticFunction objectiveFunction(numVars);

    for (SplineSample const& sample : samples)
    {
        Eigen::VectorXd const controlPointWeights =
            spline.linearCombinationAt(sample.t);

        // We use two separate linear functions to represent the horizontal
        // and vertical components of `s(t_i, X) - q_i`.
        ScalarLinearFunction deltaX(numVars);  // Initialized with zeros.
        ScalarLinearFunction deltaY(numVars);  // Initialized with zeros.

        deltaX.b = -sample.attractionPoint(0);
        deltaY.b = -sample.attractionPoint(1);

        for (size_t i = 0; i < numVars; ++i)
        {
            // Even indices of vector x correspond to the x coordinates
            // of control points.
            deltaX.a(i * 2) = controlPointWeights(i);

            // Odd indices of vector x correspond to the y coordinates
            // of control points.
            deltaY.a(i * 2 + 1) = controlPointWeights(i);
        }

        objectiveFunction += deltaX * deltaX + deltaY * deltaY;
    }

    return objectiveFunction;
}

Now, to solve for $ \mathbf{x} $ (the flattened collection of spline control points), we just have to set the gradient to zero and solve the resulting linear system:

ScalarQuadraticFunction const objectiveFunction =
    buildObjectiveFunction(spline, samplesOfSpline);

VectorLinearFunction const gradient = objectiveFunction.gradient();

auto qr = gradient.A.colPivHouseholderQr();
if (!qr.isInvertible())
{
    // Handle error.
}

Eigen::VectorXd const x = qr.solve(-gradient.b);

That gives us the new positions of spline’s control points that minimize the sum of squared lengths of those green lines on the picture. Then, we repeat the whole process starting from spline sampling, and keep repeating it until the average squared length of a green line stops reducing.

Challenges and Improvements

Unfortunately, the approach described above doesn’t work that well. You may well end up with a result like this:

The root cause of such a poor fit is that our objective function (1) discourages the lateral movement of spline control points, as such a movement would lead to the average length of a green line increasing.

Basically, this:

would become this:

As can be seen, the green lines would longer which is something our objective function really fights against!

And yet, we do need more control points where the polyline curves more, in order to represent that curvature. What can we do about it? Well, we could improve the initial positioning of control points, so that more control points are placed in high curvature zones. However, that alone would probably not be enough. A better way is to change our objective function. Our initial objective function (1) penalizes displacements relative to the attraction point equally in all directions. See the isocontours of that penalty function:

What we need is a penalty function what penalizes the displacements along the spline less than those in the orthogonal direction:

Each ellipse represents a constant level of penalty. So, on the same penalty budget we can now afford a larger displacement along the polyline than in the orthogonal direction.

How much do we stretch those isocontours? We stretch them more in low curvature areas and less in high curvature ones. This article is getting longer than I would like it to be, so I am not giving the full details here, but you can find them in these two papers (1, 2). Also take a look at the relevant code in Scan Tailor. What’s important is that the modified objective function is still quadratic, so we optimize it exactly the same way.

Does the above help? It does, but it solves one problem and creates two more:

When trying to fit a spline to a more or less flat polyline, loops are occasionally created:
The spline endpoints may end up far from the polyline endpoints.

Let’s solve each of the above problems.

Avoiding Loops

The 1st problem is solved by adding another penalty term to (1) that penalizes the sum of squared distances between the adjacent control points, or alternatively, between points on the spline corresponding to adjacent control points. The 1st option is going look like this:

\[\alpha \sum_{i=1}^{K-1}{\|\mathbf{X}_{i+1} - \mathbf{X}_i\|^2}\]

Where:

$ \alpha $ is the importance factor of this penalty term. We start with a higher value and gradually reduce it in subsequent iterations.
$ K $ is the number of control points in a spline.
$ \mathbf{X}_i $ is the $ i $’th column of matrix $ \mathbf{X} $, that is the $ i $’th control point of the spline.

Adding Constraints

The 2nd problem is solved by adding constraints to our optimization problem. We need two such constraints: $ \mathbf{X}\mathbf{v}(0) = \mathbf{q}_{first} $ and $ \mathbf{X}\mathbf{v}(1) = \mathbf{q}_{last} $.

Where:

$ \mathbf{v}(0) $ and $ \mathbf{v}(1) $ are the coefficients of linear combinations of spline’s control points that produce the start and the end point of the spline respectively.
$ \mathbf{q}_{first} $ and $ \mathbf{q}_{last} $ are the first and the last points of the target polyline respectively.

Lagrange Multipliers

Whenever you hear about optimization under equality constraints, Lagrange Multipliers should come to mind. In fact, this is the perfect use case for them: a quadratic objective function with linear equality constraints is as easy to optimize as a quadratic function without constraints - the linear system just gets a bit larger.

Let me remind you how to use the method of Lagrange multipliers to add constraints to an optimization problem. Suppose, you need to optimize (maximize or minimize) a general, scalar-valued function $ g(\mathbf{x}) $ (where $ \mathbf{x} $ is a vector) subject to $ M $ constraints of the form $ h_i(\mathbf{x}) = c_i $ (where $ c_i $ are constants). The solution is to solve the following system of equations for $ \mathbf{x} $ and $ \mathbf{\lambda} $:

\[\begin{align} \nabla g(\mathbf{x}) &= \sum_{i=1}^M{\mathbf{\lambda}_i \nabla h_i(\mathbf{x})} \notag \\ h_1(\mathbf{x}) &= c_1 \notag \\ \vdots \notag \\ h_M(\mathbf{x}) &= c_M \notag \\ \end{align}\]

In plain English: you solve for the constraints with an additional condition that the gradient of the objective function is a linear combination of gradients of constraints. The coeffiecints ($ \mathbf{\lambda}_i $) in that linear combination are called Lagrange multipliers.

Closing Words

That’s it more or less. I had to skip some details in order to keep the article’s length reasonable, but if you still remember some Linear Algebra and some Calculus, it should be quite possible for you to fill those gaps and implement spline fitting for your project. For reference, check out Scan Tailor’s implementation.

BTW, the author may be available for contracting work.