Integer Programming on adventures in optimization

🐏 RAMS Reboot

Wed, 25 Jun 2025 00:00:00 +0000

Some years ago, I worked on real-time meal delivery at Zoomer, a YC startup based out of Philadelphia. Zoomer’s production tech stack was primarily Ruby. As it grew we moved from using heuristics for things like routing and scheduling to open source optimization solvers.

Like most languages that aren’t Python, Ruby doesn’t have an especially mature ecosystem for optimization (or data science, or machine learning, for that matter). For some use cases that didn’t matter. When we upgraded the routing engine, we built a model in C++ using Gecode and wrapped a Ruby gem around a SWIG wrapper. But when we wanted to use integer programming to build schedules, the lack of solver APIs proved inconvenient.¹

At the time, PuLP was probably the most commonly used open source multi-solver Python library for linear and integer programming.² This led me the opportunity to develop RAMS, a PuLP-inspired library for basic MILP modeling in Ruby.

Then the Zoomer team became part of Grubhub. We moved to a Java stack and a commercial optimization solver. Improvements to the RAMS project languished on my todo list. It lagged behind major versions of Ruby, optimization, solvers, and dependencies, painfully out of date and unmaintained.

Then, last month, Github released its Copilot agent. Unlike vibe coding directly in the editor, which sounds like speeding maniacally through a bad acid trip, the idea here is more like running a project: create issues, receive and comment on pull requests, iterate.

I figured the grunt work of library upgrades should be perfect fodder to try out an AI developer assistant. RAMS is already well structured and tested. The upgrade is well defined. No creativity required.

A RAMS modeling example

This post is meandering through two topics: solving optimization models with Ruby and RAMS, and my experiences maintaining that library using Copilot. I could have split this into two posts, but that didn’t feel right. So let’s show what building a model in RAMS looks like first.

I don’t use Ruby with any regularity these days³, but modeling with RAMS reminded me how elegant Ruby DSLs can be. Here’s a simple example of a binary integer program.

#!/usr/bin/env ruby

require 'rams'

m = RAMS::Model.new

x1 = m.variable type: :binary
x2 = m.variable type: :binary
x3 = m.variable type: :binary

m.constrain(x1 + x2 + x3 <= 2)
m.constrain(x2 + x3 <= 1)

m.sense = :max
m.objective = 1 * x1 + 2 * x2 + 3 * x3

solution = m.solve
puts <<-HERE
objective: #{solution.objective}
x1 = #{solution[x1]}
x2 = #{solution[x2]}
x3 = #{solution[x3]}
HERE

I think that’s rather nice, and very clean.

RAMS enhancements

The biggest change in RAMS is that it now supports the HiGHS optimization solver. Prior to v0.2.0, GLPK was the default solver, but now that is HiGHS. There are a number of smaller changes as well.

RAMS requires Ruby v3.1.
CPLEX support was removed since I can’t test it.⁴
One can set solver paths using environment variables (e.g. RAMS_SOLVER_PATH_CBC).
Improved documentation and a logo!

The Copilot agent as coding companion

While I tend to err on the side of LLM skepticism, working with the Copilot agent for this upgrade was generally positive. It was a bit like working with a fast, responsive, and inexperienced developer. The issues it ran into were pretty much the same, but the time scale was compressed.

I had it open three pull requests for me.

🤨 PR 29: Upgrade Ruby and dependencies

Performance here was middling. Copilot got through some of the task without assistance. It also made a number of changes that were unhelpful and irrelevant to the request.

On a positive note, I forgot to ask it to change from CircleCI to GitHub Actions for testing. This gave me the opportunity to test its response to feature creep. It responded with a partially working GitHub Actions workflow (and no grumbling!).

Copilot made a number of errors and wasn’t able to finish the upgrade on its own.

It decided to build the optimizers from source instead of simply installing binary packages using apt or dnf. Not only is this wasteful and overly complicated, it ultimately wasn’t able to build and install them from source.
Once I told it to use a Fedora 42 base image, this improved, but it couldn’t figure out what package to use for the CBC solver. It switched back and forth without prompting between cbc (incorrect) and coin-or-Cbc (correct).
It inexplicably couldn’t figure out the latest stable version of Ruby.
It added a bunch of architecture-specific package definitions to the build, unprompted. This was unnecessary given that RAMS is a vanilla Ruby project.
I had to help it figure out that the CBC binary is now called coin.cbc on Fedora. This wasn’t entirely surprising.

🤩 PR 32: Add environment variables for solver paths

Copilot did a great job on this task. I had no issue with the code it wrote. It followed the style of the rest of the package nicely. It added appropriate documentation and unit tests.

👌 PR 34: Support the HiGHS optimization solver

Copilot did pretty well here, even though it didn’t get the feature working. It was able to create a new solver interface and get most of the logic for solution parsing right. I was a little surprised that it forgot to test the new solver integration in GitHub Actions. The biggest issue it needed my help on was solution status parsing, where it didn’t realize that the second condition here will never trigger.

return :feasible if status =~ /feasible/i
return :infeasible if status =~ /infeasible/i

This should have been the following (note the ^).

return :feasible if status =~ /^feasible/i
return :infeasible if status =~ /infeasible/i

I don’t remember finding any MILP modeling interfaces for Ruby like PuLP in 2016-17. More recently, Rulp and Opt have been developed. ↩︎
PulLP is still heavily used and developed. ↩︎
Once upon a time I was a Perl programmer. Ruby was originally written to be a better Perl. I’ve long since given up the old ways. ↩︎
For now, RAMS is focussing on open source solvers. Maintaining commercial solver licenses can be challenging when you’re not part of academia. PRs welcome. ↩︎

👔 Hierarchical Optimization with HiGHS

Mon, 11 Nov 2024 00:00:00 +0000

In the last post, we used Gurobi’s hierarchical optimization features to compute the Pareto front for primary and secondary objectives in an assignment problem. This relied on Gurobi’s setObjectiveN method and its internal code for managing hierarchical problems.

Some practitioners may need to do this without access to a commercial license. This post adapts the previous example to use HiGHS and its native Python interface, highspy. It’s also useful to see what the procedure is in order to understand it better. This isn’t exactly what I’d call hard, but it is easy to mess up.¹

Code

The mathematical models are available in the last post, so I won’t restate them here. We start in roughly the same manner as before²: create a binary variable for each worker-patient pair, add assignment problem constraints, and state the primary objective.

from itertools import product
import highspy

n = len(data["cost"])
workers = range(n)
patients = range(n)
workers_patients = list(product(workers, patients))

h = highspy.Highs()

# x[w,p] = 1 if worker w is assigned to patient p.
x = {(w, p): h.addBinary(obj=data["cost"][w][p]) for w, p in workers_patients}

# Each worker is assigned to one patient.
h.addConstrs(sum(x[w, p] for p in patients) == 1 for w in workers)

# Each patient is assigned one worker.
h.addConstrs(sum(x[w, p] for w in workers) == 1 for p in patients)

# Primary objective: minimize cost.
h.setMinimize()
h.solve()
cost = h.getObjectiveValue()

Note that if the costs and affinities were lists instead of matrices, we could have used h.addBinaries instead of h.addBinary.

From here we’ll be solving the model twice for every value of alpha. These expressions for total cost and affinity will make a code a little cleaner.

cost_expr = sum(data["cost"][w][p] * x[w, p] for w, p in workers_patients)
affinity_expr = sum(data["affinity"][w][p] * x[w, p] for w, p in workers_patients)

Now comes the hierarchical optimization logic. For every value of alpha, we find the best affinity possible while keeping cost within alpha of its best possible value.

Update the objective function to maximize affinity (see the calls to h.changeColCost and h.setMaximize).
Constrain the cost to be within alpha of the original optimal cost (see cost_cons).
Re-optimize and save the maximal affinity.

Now we constrain the affinity and re-optimize cost.³

Update the objective function to minimize cost again.
Constrain the affinity.

Once that’s done, we remove the additional constraints and repeat for a new value of alpha.

for alpha in alphas:
    # Secondary objective: maximize affinity.
    for (w, p), x_wp in x.items():
        h.changeColCost(x_wp.index, data["affinity"][w][p])

    # Constrain cost to be within alpha of maximum.
    cost_cons = h.addConstr(cost_expr <= (1 + alpha) * cost)

    h.setMaximize()
    h.solve()
    affinity = h.getObjectiveValue()

    # Re-optimize with original cost objective, constraining affinity.
    for (w, p), x_wp in x.items():
        h.changeColCost(x_wp.index, data["cost"][w][p])
    affinity_cons = h.addConstr(affinity_expr >= affinity)

    h.setMinimize()
    h.solve()

    yield alpha, h.getObjectiveValue(), affinity

    # Remove cost and affinity constraints for
    h.removeConstr(cost_cons)
    h.removeConstr(affinity_cons)

Encouragingly, running this using the model.py linked below gives the same values as the Gurobi model, albeit not as quickly. Floating point values are rounded for readability.

| alpha | cost     | affinity |
| ----- | -------- | -------- |
| 0.0   | 11212.0  | 53816.0  |
| 0.05  | 11761.0  | 74001.0  |
| 0.1   | 12332.0  | 79981.0  |
| 0.15  | 12886.0  | 83103.0  |
| 0.2   | 13454.0  | 85394.0  |
| 0.25  | 13996.0  | 87136.0  |
| 0.3   | 14557.0  | 88546.0  |
| 0.35  | 15125.0  | 89751.0  |
| 0.4   | 15670.0  | 90664.0  |
| 0.45  | 16255.0  | 91345.0  |
| 0.5   | 16816.0  | 91997.0  |
| 0.55  | 17370.0  | 92537.0  |
| 0.6   | 17924.0  | 93012.0  |
| 0.65  | 18495.0  | 93491.0  |
| 0.7   | 19055.0  | 93829.0  |
| 0.75  | 19591.0  | 94228.0  |
| 0.8   | 20167.0  | 94530.0  |
| 0.85  | 20737.0  | 94833.0  |
| 0.9   | 21295.0  | 95114.0  |
| 0.95  | 21812.0  | 95361.0  |
| 1.0   | 22402.0  | 95613.0  |

Resources

model.py hierarchical objectives HiGHS model

It gets even easier to mess up with more than two objectives. ↩︎
Isn’t it nice that MIP modeling is similar across different APIs? ↩︎
Exercise for the reader: why do we need to re-optimize cost? ↩︎

👔 Hierarchical Optimization with Gurobi

Fri, 08 Nov 2024 00:00:00 +0000

One of the first technology choices to make when setting up an optimization stack is which modeling interface to use. Even if we restrict our choices to Python interfaces for MIP modeling, there are lots of options to consider.

If you use a specific solver, you can opt for its native Python interface. Examples include libraries like gurobipy, Fusion, highspy, or PySCIPOpt. This approach provides access to important solver-specific features such as lazy constraints, heuristics, and various solver settings. However, it can also lock you into a solver before ready for that.

You can also choose a modeling API that targets multiple solvers. In the Python ecosystem. These are libraries like amplpy, Pyomo, PyOptInterface, and linopy. These interfaces target multiple solver backends (both open source and commercial) and provide a subset of the functionality of each. Since they make it easy to switch between solvers, this is usually where I start.¹

Hierarchical assignment

However, there are plenty of times when solver-specific APIs are useful, or even critical. One example is hierarchical optimization. This is a simple technique for managing trade-offs between multiple objectives in a problem. Let’s look at an example.

Imagine we are assigning in-home health care workers ($w \in W$) to patients ($p \in P$). For simplicity, let’s say we have $n$ workers and $n$ patients, and we are assigning them one-to-one. Each worker has a given cost ($c_{wp}$) of assignment to each patient, which may reflect something like the travel time to get to them. We want to assign each worker to exactly one patient while minimizing the overall cost.

Model

So far, what we have is a simple linear sum assignment problem.

$$ \begin{align*} & \text{min} && z = \sum_{wp} c_{wp} x_{wp} \\ & \text{s.t.} && \sum_w x_{wp} = 1 && \forall \quad p \in P \\ & && \sum_p x_{wp} = 1 && \forall \quad w \in W \\ & && x \in \{0,1\}^{|W \times P|} \end{align*} $$

Solving this model gives us the minimum cost assignment. That’s all well and good, but now say we have a secondary objective of maximizing affinity of workers to patients ($a_{wp}$). That is, we want to prefer assignments that increase overall affinity while still minimizing cost. This is actually a common goal in health care scheduling: if possible, send the same worker to a given patient that you usually send.

Hierarchical optimization gives us a simple way to solve this problem. First, we optimize the model as stated above. This gives us an optimal objective value $z^*$. Then we re-solve the same optimization model, while constraining the cost to be $z^*$ and using the secondary objective function. This says to the optimizer, “improve the affinity as much as you can, but keep the cost optimal.”

$$ \begin{align*} & \text{max} && w = \sum_{wp} a_{wp} x_{wp} \\ & \text{s.t.} && \sum_{wp} c_{wp} x_{wp} \le z^* \\ & && \sum_w x_{wp} = 1 && \forall \quad p \in P \\ & && \sum_p x_{wp} = 1 && \forall \quad w \in W \\ & && x \in \{0,1\}^{|W \times P|} \end{align*} $$

From here, the natural question becomes: what if we trade off some cost for affinity? If we’re willing to increase cost by some percentage, how much more affinity do we get? We can do this by setting a constant $\alpha \ge 0$ and solving the model a number of times.²

$$ \begin{align*} & \text{max} && w = \sum_{wp} a_{wp} x_{wp} \\ & \text{s.t.} && \sum_{wp} c_{wp} x_{wp} \le (1 + \alpha) z^* \\ & && \sum_w x_{wp} = 1 && \forall \quad p \in P \\ & && \sum_p x_{wp} = 1 && \forall \quad w \in W \\ & && x \in \{0,1\}^{|W \times P|} \end{align*} $$

For example, if $\alpha = 0.05$, then we’re willing to accept a 5% increase in overall cost to improve affinity. Setting different values of $\alpha$ lets us explore the space of that trade-off and its impact on cost and affinity.

Once we solve this and get the optimal affinity ($w^*$), we should re-optimize for the primary objective again while constraining the secondary one.

$$ \begin{align*} & \text{min} && \sum_{wp} c_{wp} x_{wp} \\ & \text{s.t.} && \sum_{wp} a_{wp} x_{wp} \ge w^* \\ & && \sum_w x_{wp} = 1 && \forall \quad p \in P \\ & && \sum_p x_{wp} = 1 && \forall \quad w \in W \\ & && x \in \{0,1\}^{|W \times P|} \end{align*} $$

Code

So the math looks reasonable. How do we implement it? If we have a Gurobi license, we can use its built-in facilities for multiobjective optimization. This means that, instead solving a model multiple times and adding constraints to keep cost within $\alpha$ of its optimal value, we can create a single model that does all of this for us.

Assume we have input data which looks like this.

{
    "cost": [
        [10, 20, ...],
        [30, 40, ...],
        ...
    ],
    "affinity": [
        [25, 15, ...],
        [35, 25, ...],
        ...
    ]
}

We start with a simple assignment problem formulation.

import gurobipy as gp

n = len(data["cost"])
workers = range(n)
patients = range(n)

m = gp.Model()
m.ModelSense = gp.GRB.MINIMIZE

# x[w,p] = 1 if worker w is assigned to patient p.
x = m.addVars(n, n, vtype=gp.GRB.BINARY)

for i in range(n):
    # Each worker is assigned to one patient.
    m.addConstr(gp.quicksum(x[i, p] for p in patients) == 1)

    # Each patient is assigned one worker.
    m.addConstr(gp.quicksum(x[w, i] for w in workers) == 1)

We add primary and secondary objectives, and call optimize. The objectives are solved in descending order of the priority flag for Model.setObjectiveN. reltol allows us to degrade the primary objective by some amount (e.g. 5%) to improve the secondary objective.

One catch is that the model only has one objective sense. Since we are minimizing the primary objective, we give the secondary objective a weight of -1 in order to maximize it.

from itertools import product

# Primary objective: minimize cost.
z = (data["cost"][w][p] * x[w, p] for w, p in product(workers, patients))
m.setObjectiveN(expr=gp.quicksum(z), index=0, name="cost", priority=1, reltol=alpha)

# Secondary objective: maximize affinity. Since the model sense is minimize,
# we negate the secondary objective in order to maximize it.
w = (data["affinity"][w][p] * x[w, p] for w, p in product(workers, patients))
m.setObjectiveN(
    expr=gp.quicksum(w), index=1, name="affinity", priority=0, weight=-1
)

m.optimize()

Then we use this magic syntax to pull out the optimal cost and affinity.

m.params.ObjNumber = 0
cost = m.ObjNVal

m.params.ObjNumber = 1
affinity = m.ObjNVal

Results

If we solve this in a loop with alpha values from 0 to 1 in increments of 0.05, we can plot the trade-off between cost and affinity. Going from $\alpha = 0$ to $\alpha = 0.05$ or $\alpha = 0.1$ gives a pretty sizable improvement in affinity. After that, the return starts to gradually level off. This allows us to make a more informed choice about these two objectives.

Resources

generate.py generates input data
input-100x100.json contains input data
model.py hierarchical objectives Gurobi model

While commercial libraries like AMPL have always focussed on modeling performance, some of the open source options targeting multiple solvers come with significant performance penalties during formulation and model handoff to the solver. Newer options like linopy (benchmarks) and PyOptInterface (benchmarks) don’t have that issue. ↩︎
This gives us a Pareto front, which explores the trade-offs between different objectives. ↩︎

📅 Reducing Overscheduling

Sun, 26 Nov 2023 00:00:00 +0000

At a Nextmv tech talk a couple weeks ago, I showed a least absolute deviations (LAD) regression model using OR-Tools. This isn’t new – I pulled the formulation from Rob Vanderbei’s “Local Warming” paper, and I’ve shown similar models at conference talks in the past using other modeling APIs and solvers.

There are a couple reasons I keep coming back to this problem. One is that it’s a great example of how to build a machine learning model using an optimization solver. Unless you have an optimization background, it’s probably not obvious you can do this. Building a regression or classification model with a solver directly is a great way to understand the model better. And you can customize it in interesting ways, like adding epsilon insensitivity.

Another is that least squares, while most commonly used regression form, has a fatal flaw: it isn’t robust to outliers in the input data. This is because least squares minimize the sum of squared residuals, as shown in the formulation below. Here, $A$ is an $m \times n$ matrix of feature data, $b$ is a vector of observations to fit, and $x$ is a vector of coefficients the optimizer must find.

$$ \min f(x) = \Vert Ax-b \Vert^2 $$

Since the objective function minimizes squared residuals, outliers have a much bigger impact than other data. LAD regression solves this by simply summing the values of the residuals as they are.

$$ \min f(x) = \vert Ax-b \vert $$

So why isn’t this used more? Simple – least squares has a convenient analytical solution, while LAD requires an algorithm to solve. For instance, you can formulate LAD regression as a linear program, but now you need a solver.

$$ \begin{align*} \min \quad & 1’z \\ \text{s.t.}\ \quad & z \ge Ax - b \\ & z \ge b - Ax \end{align*} $$

While I like using this example, it paints a rather negative picture of squaring. If it does funny things to solvers, is there any good reason to square? Thus I’ve been on the lookout for a practical example where squaring a variable or expression makes a model more useful.

Luckily for me, Erwin Kalvelagen recently posted about using optimization to schedule team meetings. This is an application where minimizing squared values of overbooking can be beneficial – it may be worse to be triple booked than double booked.

I won’t recreate the reasoning behind Erwin’s post here. You can read his blog for that. What we’ll do is look at both the formulations in his post, along with a couple extras using Julia for code, JuMP for modeling, SCIP for optimization, and Gadfly for visualization. All model code and data are linked in the resources section at the end.

Maximize attendance

To start off, I built a new data set, which you can find in the resources section. This differentiates team membership between two types of employees: individual contributors (starting with ic in the data), who attend meetings for 1 or 2 teams, and managers (prefixed with mgr), who attend meetings to coordinate across multiple teams. We schedule meetings for 10 teams (prefix t) into 3 time slots (s).

The first model in Erwin’s post maximizes attendance. This means it tries to schedule team members for as many unique time slots as possible. It doesn’t consider overbooking.

$$ \begin{align*} \max\quad & \sum_{i,s} y_{i,s} \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & y_{i,s} \le \sum_{t} m_{i,t}\ x_{t,s} &\quad\forall&\ i,s & \text{individuals attend team meetings}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s\\ & y_{i,s} \in \{0,1\} &\quad\forall&\ i,s \end{align*} $$

This yields the following team schedule, with red representing a scheduled team meeting.

If we look at the manager schedules, we’ll see that every manager is completely booked. This makes sense. That’s what managers do, right? Go to meetings?

Minimize overbooking

The model gets more interesting once we account for overbooking. Erwin’s post has a model that minimizes overbooking, where overbooking is the number of additional meetings in a time slot. If a team member is double booked, that’s 1 overbooking. If they are triple booked, that’s 2 overbookings.

Sum of overbooking

The second model in Erwin’s post minimizes the sum of all overbookings. He does this by adding a continuous c vector that only incurs value once a team member goes over a single meeting in a given time slot.

$$ \begin{align*} \min\quad & \sum_{i,s} c_{i,s} \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & c_{i,s} \ge \sum_{t} m_{i,t}\ x_{t,s} - 1 &\quad\forall&\ i,s & \text{measure overbooking}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s\\ & c_{i,s} \ge 0 &\quad\forall&\ i,s \end{align*} $$

Given our data this results in the following team schedule, which is probably not all that interesting. I’ll leave this visualization out from now on.

Where it gets interesting is plotting overbookings for the managers. Here we see that 3 manager time slots are triple booked (red), while 8 are double booked (gray).

Sum of squared overbooking

Let’s say it’s worse to triple book (or, gasp, quadruple book) than to double book. How can the model account for this? One answer, if you have a MIQP-enabled solver, is to simply square the c values.

$$ \begin{align*} \min\quad & \sum_{i,s} c_{i,s}^2 \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & c_{i,s} \ge \sum_{t} m_{i,t}\ x_{t,s} - 1 &\quad\forall&\ i,s & \text{measure overbooking}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s\\ & c_{i,s} \ge 0 &\quad\forall&\ i,s \end{align*} $$

This completely eliminates triple booking, as shown below. No manager is worse off than being double booked, which seems normal given my experiences.

The problem with this is that the solver now takes a lot longer. It’s not bad for the data in this example, but if you try it with something larger you’ll see what I mean. You can find the data generator code in the resources section.

Constrained bottleneck

So how can we do something similar without the computational cost? One option is to continue using MILP formulations, but in the context of hierarchical optimization. This means splitting the model into two. First, we try to minimize the maximum overbookings for any team member (the bottleneck, if you will). This involves adding a variable $b$ representing that maximum.

$$ b = \max\Bigl\{\sum_{t} m_{i,t}\ x_{t,s} - 1 : i \in I, s \in S \Bigr\} $$

Now we can simply minimize $b$ using a MILP instead of a MIQP.

$$ \begin{align*} \min\quad & b \\ \text{s.t.}\quad& \sum_{s} x_{t,s} = 1 &\quad\forall&\ t & \text{schedule each team meeting once}\\ & b \ge \sum_{t} m_{i,t}\ x_{t,s} - 1 &\quad\forall&\ i,s & \text{maximum overbooking}\\ & x_{t,s} \in \{0,1\} &\quad\forall&\ t,s \end{align*} $$

Once we solve the first model, we get the minimal value of $b$, which we call $b^*$. We can simply use $b^*$ as an upper bound for overbookings in the second original model.

As we see below, this model also eliminates triple bookings, and it’s quite a bit faster to solve than the MIQP.

Resources

main.go generates input data
membership.csv contains input data
maximize-attendance.jl MILP model
minimize-overbooking.jl MILP model
minimize-overbooking-squared.jl MIQP model
minimize-bottleneck.jl hierarchical MILP models

⭕ Chebyshev Centers of Polygons with Gurobi

Mon, 03 Feb 2014 00:00:00 +0000

Note: This post was written before Gurobi supported nonlinear optimization. It has been updated to work with Python 3.

A common problem in handling geometric data is determining the center of a given polygon. This is not quite so easy as it sounds as there is not a single definition of center that makes sense in all cases. For instance, sometimes computing the center of a polygon’s bounding box may be sufficient. In some instances this may give a point on an edge (consider a right triangle). If the given polygon is non-convex, that point may not even be inside or on its boundary.

This post looks at computing Chebyshev centers for arbitrary convex polygons. We employ essentially the same model as in Boyd & Vandenberghe’s Convex Optimization text, but using Gurobi instead of CVXOPT.

Consider a polygon defined by the intersection of a finite number of half-spaces, $Au \le b$. We assume we are given the set of vertices, $V$, in clockwise order around the polygon. $E$ is the set of edges connecting these vertices. Each edge in $E$ defines a boundary of the half-space $a_i^\intercal u \le b_i$

$$ V = {(1,1), (2,5), (5,4), (6,2), (4,1)}\\ E = {((1,1),(2,5)), ((2,5),(5,4)), ((5,4),(6,2)), ((6,2),(4,1)), ((4,1),(1,1))} $$

The Chebyshev center of this polygon is the center point $(x, y)$ of the maximum radius inscribed circle. That is, if we can find the largest circle that will fit inside our polygon without going outside its boundary, its center is the point we are looking for. Our decision variables are the center $(x, y)$ and the maximum inscribed radius, $r$.

In order to do this, we consider the edges independently. The long line segment below shows an arbitrary edge, $a_i^\intercal u \le b_i$. The short line connected to it is orthogonal in the direction $a$. $(x, y)$ satisfies the inequality.

The shortest distance from $(x, y)$ will be in the direction of $a$. We’ll call this distance $r$. If we were to move the edge so it had the same slope but went through $(x, y)$, its distance from $a_i^\intercal u = b_i$ would be $r||a_i||_2$. Thus we can add a constraint of the form $a_i’u + r||a_i||_2 \le b_i$ for each edge and maximize the value of $r$ as our objective function.

$$ \begin{align*} & \text{max} && r \\ & \text{s.t.} && (y_i-y_j)x + (x_j-x_i)y + r\sqrt{(x_j-x_i)^2 + (y_j-y_i)^2} \le (y_i-y_j)x_i + (x_j-x_i)y_i \\ & && \quad \forall \quad ((x_i,y_i), (x_j,y_j)) \in E \\ \end{align*} $$

As this is linear, we can solve it using any LP solver. The following code does so with Gurobi.

#!/usr/bin/env python3
from gurobipy import Model, GRB
from math import sqrt

vertices = [(1,1), (2,5), (5,4), (6,2), (4,1)]
edges = zip(vertices, vertices[1:] + [vertices[0]])

m = Model()
r = m.addVar()
x = m.addVar(lb=-GRB.INFINITY)
y = m.addVar(lb=-GRB.INFINITY)
m.update()

for (x1, y1), (x2, y2) in edges:
    dx = x2 - x1
    dy = y2 - y1
    m.addConstr((dx*y - dy*x) + (r * sqrt(dx**2 + dy**2)) <= dx*y1 - dy*x1)

m.setObjective(r, GRB.MAXIMIZE)
m.optimize()

print('r = %.04f' % r.x)
print('(x, y) = (%.04f, %.04f)' % (x.x, y.x))

The model output shows our center and its maximum inscribed radius.

$$ r = 1.7466\\ (x, y) = (3.2370, 2.7466) $$

Question for the reader: in certain circumstances, such as rectangles, the Chebyshev center is ambiguous. How might one get around this ambiguity?

🏖️ Langrangian Relaxation with Gurobi

Sat, 22 Sep 2012 00:00:00 +0000

Note: This post was updated to work with Python 3 and the 2nd edition of “Integer Programming” by Laurence Wolsey.

We’ve been studying Lagrangian Relaxation (LR) in the Advanced Topics in Combinatorial Optimization course I’m taking this term, and I had some difficulty finding a simple example covering its application. In case anyone else finds it useful, I’m posting a Python version for solving the Generalized Assignment Problem (GAP). This won’t discuss the theory of LR at all, just give example code using Gurobi.

Generalized assignment

The GAP as defined by Wolsey consists of a maximization problem subject to a set of set packing constraints followed by a set of knapsack constraints.

$$ \begin{align*} & \text{max} && \sum_i \sum_j c_{ij} x_{ij} \\ & \text{s.t.} && \sum_j x_{ij} \leq 1 && \forall i \\ & && \sum_i a_{ij} x_{ij} \leq b_{ij} && \forall j \\ & && x_{ij} \in {0, 1} \end{align*} $$

Naive model

A naive version of this model using Gurobi might look like the following.

#!/usr/bin/env python

# This is the GAP per Wolsey, pg 208.
from gurobipy import Model, GRB, quicksum as qsum

m = Model("GAP per Wolsey")
m.modelSense = GRB.MAXIMIZE
m.setParam("OutputFlag", False)  # turns off solver chatter

b = [15, 15, 15]
c = [
    [6, 10, 1],
    [12, 12, 5],
    [15, 4, 3],
    [10, 3, 9],
    [8, 9, 5],
]
a = [
    [5, 7, 2],
    [14, 8, 7],
    [10, 6, 12],
    [8, 4, 15],
    [6, 12, 5],
]

# x[i][j] = 1 if i is assigned to j
x = [[m.addVar(vtype=GRB.BINARY) for _ in row] for row in c]

# sum j: x_ij <= 1 for all i
for x_i in x:
    m.addConstr(sum(x_i) <= 1)

# sum i: a_ij * x_ij <= b[j] for all j
for j, b_j in enumerate(b):
    m.addConstr(qsum(a[i][j] * x_i[j] for i, x_i in enumerate(x)) <= b_j)

# max sum i,j: c_ij * x_ij
m.setObjective(
    qsum(qsum(c_ij * x_ij for c_ij, x_ij in zip(c_i, x_i)) for c_i, x_i in zip(c, x))
)
m.optimize()

# Pull solution out of m.
print(f"z = {m.objVal}")
print("x = [")
for x_i in x:
    print(f"  {[1 if x_ij.x >= 0.5 else 0 for x_ij in x_i]}")
print("]")

The solver quickly finds the following optimal solution of this toy problem.

z = 46.0
x = [
  [0, 1, 0]
  [0, 1, 0]
  [1, 0, 0]
  [0, 0, 1]
  [0, 0, 0]
]

Lagrangian model

There are two sets of constraints we can dualize. It can be beneficial to apply Lagrangian Relaxation against problems composed of knapsack constraints, so we will dualize the set packing ones.

# sum j: x_ij <= 1 for all i
for x_i in x:
    model.addConstr(sum(x_i) <= 1)

We replace these with a new set of variables, penalties, which take the values of the slacks on the set packing constraints. We then modify the objective function, adding Lagrangian multipliers times these penalties.

Instead of optimizing once, we do so iteratively. An important consideration is we may get nothing more than a dual bound from this process. Any integer solution is not guaranteed to be primal feasible unless it satisfies complementary slackness conditions – for each dualized constraint either its multiplier or penalty must be zero.

We then set the initial multiplier values to 2 and use sub-gradient optimization with a step size of 1 / (iteration #) to adjust them.

#!/usr/bin/env python

# This is the GAP per Wolsey, pg 208, using Lagrangian Relaxation.
from gurobipy import Model, GRB, quicksum as qsum

m = Model("GAP per Wolsey with Lagrangian Relaxation")
m.modelSense = GRB.MAXIMIZE
m.setParam("OutputFlag", False)  # turns off solver chatter

b = [15, 15, 15]
c = [
    [6, 10, 1],
    [12, 12, 5],
    [15, 4, 3],
    [10, 3, 9],
    [8, 9, 5],
]
a = [
    [5, 7, 2],
    [14, 8, 7],
    [10, 6, 12],
    [8, 4, 15],
    [6, 12, 5],
]

# x[i][j] = 1 if i is assigned to j
x = [[m.addVar(vtype=GRB.BINARY) for _ in row] for row in c]

# As stated, the GAP has these following constraints. We dualize these into
# penalties instead, using variables so we can easily extract their values.
penalties = [m.addVar() for _ in x]

# Dualized constraints: sum j: x_ij <= 1 for all i
for p, x_i in zip(penalties, x):
    m.addConstr(p == 1 - sum(x_i))

# sum i: a_ij * x_ij <= b[j] for all j
for j, b_j in enumerate(b):
    m.addConstr(qsum(a[i][j] * x_i[j] for i, x_i in enumerate(x)) <= b_j)

# u[i] = Lagrangian Multiplier for the set packing constraint i
u = [2.0] * len(x)

# Re-optimize until either we have run a certain number of iterations
# or complementary slackness conditions apply.
for k in range(1, 101):
    # max sum i,j: c_ij * x_ij
    m.setObjective(
        qsum(
            # Original objective function
            sum(c_ij * x_ij for c_ij, x_ij in zip(c_i, x_i))
            for c_i, x_i in zip(c, x)
        )
        + qsum(
            # Penalties for dualized constraints
            u_j * p_j
            for u_j, p_j in zip(u, penalties)
        )
    )
    m.optimize()

    print(
        f"iteration {k}: z = {m.objVal}, u = {u}, penalties = {[p.x for p in penalties]}"
    )

    # Test for complementary slackness
    stop = True
    eps = 10e-6
    for u_i, p_i in zip(u, penalties):
        if abs(u_i) > eps and abs(p_i.x) > eps:
            stop = False
            break

    if stop:
        print("primal feasible & optimal")
        break

    else:
        s = 1.0 / k
        for i in range(len(x)):
            u[i] = max(u[i] - s * (penalties[i].x), 0.0)

# Pull solution out of m.
print(f"z = {m.objVal}")
print("x = [")
for x_i in x:
    print(f"  {[1 if x_ij.x >= 0.5 else 0 for x_ij in x_i]}")
print("]")

Again, the example converges very quickly to an optimal solution.

iteration 1: z = 48.0, u = [2.0, 2.0, 2.0, 2.0, 2.000], penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
iteration 2: z = 47.0, u = [2.0, 2.0, 2.0, 2.0, 1.000], penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
iteration 3: z = 46.5, u = [2.0, 2.0, 2.0, 2.0, 0.500], penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
iteration 4: z = 46.2, u = [2.0, 2.0, 2.0, 2.0, 0.167], penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
iteration 5: z = 46.0, u = [2.0, 2.0, 2.0, 2.0, 0.000], penalties = [0.0, 0.0, 0.0, 0.0, 1.0]
primal feasible & optimal
z = 46.0
x = [
  [0, 1, 0]
  [0, 1, 0]
  [1, 0, 0]
  [0, 0, 1]
  [0, 0, 0]
]

Exercise for the reader: change the script to dualize the knapsack constraints instead of the set packing constraints. What is the result of this change in terms of convergence?

Resources

🔲 Normal Magic Squares

Fri, 13 Jan 2012 00:00:00 +0000

Note: This post was updated to work with Python 3 and PySCIPOpt. The original version used Python 2 and python-zibopt. It has also been edited for clarity.

As a followup to the last post, I created another SCIP example for finding Normal Magic Squares. This is similar to solving a Sudoku problem, except that here the number of binary variables depends on the square size. In the case of Sudoku, each cell has 9 binary variables – one for each potential value it might take. For a normal magic square, there are $n^2$ possible values for each cell, $n^2$ cells, and one variable representing the row, column, and diagonal sums. This makes a total of $n^4$ binary variables and one continuous variables in the model.

However, there are no big-Ms.

I think the neat part of this code is in this section:

# Construct an expression for each cell that is the sum of
# its binary variables with their associated coefficients.
sums = []
for row in matrix:
    sums_row = []
    for cell in row:
        sums_row.append(sum((i + 1) * x for i, x in enumerate(cell)))
    sums.append(sums_row)

It creates sums of the $n^2$ variables for each cell with their appropriate coefficients ($1$ to $n^2$) and stores those expressions to make the subsequent constraint creation simpler.

Another interesting exercise for the reader: Change the code to minimize the sum of each column. How does that impact the solution time?

🔲 Magic Squares and Big-Ms

Thu, 12 Jan 2012 00:00:00 +0000

Note: This post was updated to work with Python 3 and PySCIPOpt. The original version used Python 2 and python-zibopt. It has also been edited for clarity.

Back in October of 2011, I started toying with a model for finding magic squares using SCIP. This is a fun modeling exercise and a challenging problem. First one constructs a square matrix of integer-valued variables.

from pyscipopt import Model

# [...snip...]

m = Model()

matrix = []
for i in range(size):
    row = [m.addVar(vtype="I", lb=1) for _ in range(size)]
    for x in row:
        m.addCons(x <= M)
    matrix.append(row)

Then one adds the following constraints:

All variables ≥ 1.
All rows, columns, and the diagonal sum to the same value.
All variables take different values.

The first two constraints are trivial to implement, and relatively easy for the solver. What I do is add a single extra variable then set it equal to the sums of each row, column, and the diagonal.

sum_val = m.addVar(vtype="M")
for i in range(size):
    m.addCons(sum(matrix[i]) == sum_val)
    m.addCons(sum(matrix[j][i] for j in range(size)) == sum_val)

m.addCons(sum(matrix[i][i] for i in range(size)) == sum_val)

It’s the third that messes things up. You can think of this as saying, for every possible pair of integer-valued variables $x$ and $y$:

$$ x \ge y + 1 \quad \text{or} \quad x \le y - 1 $$

Why is this hard? Because we can’t add both constraints to the model. That would make it infeasible. Instead, we add write them in such a way that exactly one will be active for any any given solution. This requires, for each pair of variables, an additional binary variable $z$ and a (possibly big) constant $M$. Thus we reformulate the above as:

$$ x \ge (y + 1) - M z \ x \le (y - 1) + M (1-z) \ z \in {0,1} $$

In code this looks like:

from itertools import chain

all_vars = list(chain(*matrix))
for i, x in enumerate(all_vars):
    for y in all_vars[i+1:]:
        z = m.addVar(vtype="B")
        m.addCons(x >= y + 1 - M*z)
        m.addCons(x <= y - 1 + M*(1-z))

However, here be dragons. We may not know how big (or small) to make $M$. Generally we want it as small as possible to make the LP relaxation of our integer programming model tighter. Different values of $M$ have unpredictable effects on solution time.

Which brings us to an interesting idea:

SCIP now supports bilinear constraints. This means that I can make $M$ a variable in the above model.

import sys

try:
    M = int(sys.argv[2])
except IndexError:
    M = m.addVar(vtype="M", lb=size * size)
else:
    assert M >= size * size

The magic square model linked to in this post provides both options. The first command line argument it requires is the matrix size. The second one, $M$, is optional. If not given, it leaves $M$ up to the solver.

An interesting exercise for the reader: Change the code to search for a minimal magic square, which minimizes either the value of $M$ or the sums of the columns, rows, and diagonal.