3
$\begingroup$

I’m working on an optimization problem that I believe can be framed as a graph or flow problem, but I haven’t found a good existing formulation that scales efficiently. I’d like advice on how to model or solve it optimally.

Problem description

I have a 2D grid (matrix) of integer values in the range [-128, 127], typically around 1500×2000 in size. Each value represents a pixel intensity from an ultrasound image. I've added a synthetic example:

import numpy as np
import matplotlib.pyplot as plt

# Example synthetic test matrix
np.random.seed(0)
rows, cols = 60, 100
img = np.random.normal(0, 5, (rows, cols))  # background noise

# Create smooth wiggly centerlines
y1 = rows // 6 + 2 * np.sin(np.linspace(0, 3*np.pi, cols))
y2 = rows // 2 + 3 * np.sin(np.linspace(0, 2.5*np.pi, cols) + 1)
y3 = (rows * 4) // 5 + 2 * np.cos(np.linspace(0, 2*np.pi, cols) + 0.5)

# Draw a line with intensity profile
def draw_line(img, y_center, thickness, peak_intensity):
    half = thickness // 2
    weights = np.exp(-0.5 * (np.arange(-half, half + 1) / (thickness / 3))**2)
    weights /= weights.max()  # normalize so center = 1
    for x_idx, y_c in enumerate(y_center.astype(int)):
        for offset, w in zip(range(-half, half + 1), weights):
            y = int(y_c + offset)
            if 0 <= y < img.shape[0]:
                img[y, x_idx] += peak_intensity * w

# Draw three bright non-crossing lines (3–5 px thick)
draw_line(img, y1, thickness=3, peak_intensity=120)
draw_line(img, y2, thickness=4, peak_intensity=90)
draw_line(img, y3, thickness=5, peak_intensity=70)

# Display
plt.figure(figsize=(8, 5))
plt.imshow(img, cmap='gray', origin='upper', aspect='auto')
plt.tight_layout()
plt.show()

Example of synthetic 2D array containing lines to be detected

The goal is to detect multiple continuous lines (or paths) that run left to right across the entire width of the grid, maximizing the sum of the pixel values along the lines, while satisfying the following constraints:

  1. Connectivity

    Each line must start in the first column and end in the last column. From any pixel (r, c), the next step can only go to (r-1, c+1), (r, c+1), or (r+1, c+1) (up-right, right, or down-right).

  2. Minimum vertical spacing

    Between any two lines, a minimum vertical distance d_min (e.g., 5 pixels) must be maintained in every column. Therefore lines can never cross or share a pixel. Each pixel can be part of at most one line.

  3. Objective

    Maximize the total sum of pixel values along all selected lines (equivalently, minimize the negative sum as a cost).

  4. Performance requirement

    The algorithm must scale to matrices up to ~1500×2000 and solve within seconds (not minutes), so exponential or brute-force approaches are not practical.

Notes

  • The matrix is quite sparse in terms of meaningful structure — most pixels are near zero, while the desired lines are made of high positive values often surrounded by slightly negative ones.
  • Lines can occasionally pass through small noisy or zero-value regions, as long as the overall solution is globally optimal.
  • The lines are always one pixel thick and must span the full image width.
  • Ideally the number of lines does not have to be fixed beforehand — the solver should determine how many are beneficial based on the grid values and constraints. However consider this as a nice-to-have

What I’ve tried

  • A shortest path approach using Dijkstra’s algorithm from left to right works well for detecting a single line.
  • Running it iteratively and penalizing previously found paths can detect multiple lines, but that’s not globally optimal — later lines may block globally better configurations.
  • I experimented with linear programming and min-cost flow formulations, but the constraints for non-crossing and minimum spacing explode combinatorially and do not scale.

My question

Is there a known optimization model or algorithm that can find the global optimum set of multiple, non-crossing, spaced paths across a grid like this?

I suspect this is related to:

  • multi-commodity or disjoint path problems,
  • layered graph flow models,
  • or dynamic programming with separation constraints.

But I haven’t found a formulation that handles an unknown number of lines and minimum spacing efficiently.

Any advice on a suitable mathematical formulation, relaxation, or practical solver approach would be greatly appreciated.

$\endgroup$
16
  • $\begingroup$ To clarify: does 3 not imply 2? This can be modeled as a MIP, I'll try it out later, and if performance works, I'll write it out $\endgroup$ Commented Nov 3 at 12:09
  • $\begingroup$ @JoãoDionísio yes, the distance between lines also implies non-crossing. I merged those 2 points for clarity. I'd really love to see the results of modeling it as a MIP! $\endgroup$ Commented Nov 3 at 12:56
  • $\begingroup$ Yeah, it doesn't scale at all, at least not without doing some specialized presolving. Interesting problem, though! $\endgroup$ Commented Nov 3 at 13:18
  • $\begingroup$ Just to be clear, you want each of your lines/paths to be one pixel wide (vertically), correct? $\endgroup$ Commented Nov 3 at 20:56
  • 1
    $\begingroup$ @Teun Generate the k-shortest paths, where k is a parameter. Then define a binary variable for each of these paths. And enforce that a pixel can be covered by at most one path to prevent crossings. If you generate all the paths, this is optimal. If you can't, you can generate only the "interesting" ones dynamically in a branch-and-price fashion, also leading to optimal results. It is different from the iterative approach because all paths are selected in one iteration. $\endgroup$ Commented Nov 7 at 14:45

2 Answers 2

3
$\begingroup$

Let $R$ be the set of rows, $C$ be the set of columns and let $G=(V,A)$ be the directed (acyclic) graph where each cell $(r,c)$ defines a node, and arc $((a,b),(c,d))$ exists if and only if $d=b+1$ and $c\in \{a-1,a,a+1 \}$. The weight $\omega_{a,b}^{c,d}$ of arc $((a,b),(c,d))$ is set to the color intensity of cell $(a,b)$. Also, a source node is linked to all nodes of the first column with weight $0$, and all nodes of the last column are linked to a sink node with weight color intensity of the node.

Generate a pool $\Omega$ of paths from nodes $(r,1)$ to nodes $(r,|C|)$, and let $\omega_p$ be the weighted length of the path. Now, define binary variable $\lambda_p \in \{0,1\}$ that takes value $1$ if and only if path $p \in \Omega$ is selected. The problem is to maximize $\sum_{p \in \Omega} \omega_p\cdot \lambda_p$ subject to $$ \sum_{k=0}^5\sum_{p\in \Omega |(r+k,c) \in p}\lambda_p \le 1\quad \forall (r,c) \tag{1} $$ If the pool is large enough, the solution may be optimal. In practice, it will be hard to generate all of the good candidates, but since shortest paths are computationally friendly (the graph is acyclic!), you can generate many. Otherwise, you can generate the interesting ones dynamically with a column generation approach, and embed this in a branch-and-price scheme if you want optimality. Column generation would be "fairly" easy: if $\pi_{r,c}$ denotes the dual variable of constraint $(1)$, the reduced cost of variable $\lambda_p$ is $\omega_p - \sum_{r,c} \pi_{r,c}$, so candidate columns/paths could be generated by computing paths where the weight on each arc is simply adjusted by subtracting $\pi_{r,c}$ from the weights of arcs outgoing from node $(r,c)$.

With just a few paths nicely chosen, I am able to match @RobPratt's solution up to $b=3$ in a few seconds using Gurobi. I have not managed to match his solution with $b$ free, but am probably not generating the right paths.

enter image description here


Addendum

I believe that the efficiency of this method relies on how good the candidate paths are, so indeed, it is worth putting some effort into this pre-processing phase. Some ideas:

  1. Add some randomness in the paths, for example, generate paths from column 1 to a random node somewhere in the matrix, and then from this node to the last column.
  2. Generate paths from every cell of the first column and to every cell of the last column.
  3. If you shift the weights such that they are only positive, you can use the k-shortest-paths algorithm.
$\endgroup$
10
  • $\begingroup$ For column generation, be sure to omit the redundant upper bound on $\lambda$: or.stackexchange.com/q/608/500 $\endgroup$ Commented Nov 8 at 14:35
  • $\begingroup$ I really like this idea. Let me test it. Any suggestions on the algorithm for finding k candidate paths? $\endgroup$ Commented Nov 10 at 8:09
  • $\begingroup$ @Teun My suggestions are too long for a comment, so I have added them to my answer. $\endgroup$ Commented Nov 10 at 9:10
  • $\begingroup$ If doing things heuristically, perhaps generating paths stochastically can also help. Using the objective of the pixels to get a probability of which of the 3 (top, middle, bottom) should continue the path $\endgroup$ Commented Nov 10 at 10:27
  • $\begingroup$ @JoãoDionísio I am not sure I understand what you mean. Since the pixels are the weights of the arcs, what do you mean by "Using the objective of the pixels to get a probability"? $\endgroup$ Commented Nov 10 at 10:52
4
$\begingroup$

Here's a simple MILP formulation that you can use as a baseline to compare with other approaches. Let $R$ be the set of rows and let $C$ be the set of columns. Let $b$ be the desired number of lines. For pixel $(r,c)\in R \times C$, let $w_{rc}$ be the weight (intensity) and let binary decision variable $x_{rc}$ indicate whether the pixel appears in some path. The problem is to maximize $$\sum_{r\in R} \sum_{c\in C} w_{rc} x_{rc}$$ subject to linear constraints \begin{align} \sum_{r\in R} x_{rc} &= b &&\text{for $c\in C$} \\ x_{rc} &\le \sum_{s\in \{r-1,r,r+1\} \cap R} x_{s,c+1} &&\text{for $(r,c)\in R \times C$ such that $c+1\in C$} \\ \sum_{s \in \{r,\dots,r+5\} \cap R} x_{sc} &\le 1 &&\text{for $(r,c)\in R \times C$} \end{align} Note that this formulation does not use arc variables or flow balance constraints.


Here are optimal results for various $b$.

$b=1$, maximum 12066.215332 : enter image description here

$b=2$, maximum 21051.21997: enter image description here

$b=3$, maximum 28128.103126: enter image description here

$b=4$, maximum 28641.357616: enter image description here

If you instead let $b$ be a decision variable, $b=7$, maximum 29960.713961: enter image description here

$\endgroup$
11
  • $\begingroup$ Shouldn't the last part of the second linear constraint be formulated as x_s(c+1) for all rows and all columns except for last column? Furthermore, what programming language and module(s) did you use for this? Does it scale for larger matrices? $\endgroup$ Commented Nov 5 at 18:51
  • $\begingroup$ Yes, I kept the notation simple but will make it more explicit now. I use the MILP solver via PROC OPTMODEL in SAS. If you provide larger sample data, I"ll try it out. $\endgroup$ Commented Nov 5 at 19:48
  • $\begingroup$ Constraints 2 enforce connectivity by enforcing a selected pixel in the next column. Would it help / make sense to add similar ones to enforce a selected pixel in the previous column? $\endgroup$ Commented Nov 6 at 10:47
  • $\begingroup$ @Kuifje I suspect that those additional constraints, which are valid, would slow things down. But it is worth trying. $\endgroup$ Commented Nov 6 at 13:45
  • $\begingroup$ @RobPratt I modified the example code so that you can simply change rows and cols. As mentioned, I'm looking for an implementation that works for ~1500 rows by ~2000 columns. Also, I've implemented this using pulp in python and found that it didn't scale for me beyond ~100x100. And the additional constraints mentioned by @Kuifje in some cases make it faster on mostly noisy images without clear lines, but in most realistic scenario's it slows it down. $\endgroup$ Commented Nov 7 at 10:31

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.