Skip to content

polygonize: _visvalingam_whyatt is O(n^2) due to linear min-area scan #2539

@brendancol

Description

@brendancol

What's happening

_visvalingam_whyatt in xrspatial/polygonize.py (L1217-1316) is O(n²)
in the number of input vertices. The main loop at L1262 does a linear
scan over all vertices to find the minimum-area one on every iteration:

while remaining > 2:
    # Find vertex with minimum area.
    min_area = np.inf
    min_idx = -1
    for i in range(1, n - 1):
        if not removed[i] and areas[i] < min_area:
            min_area = areas[i]
            min_idx = i
    ...

For each removed vertex we walk all n entries again, so total work is
O(n²). On a ring with a few thousand vertices this dominates the
simplification step.

Fix

Use a min-heap keyed on triangle area. Each removal is O(log n), and
the two affected neighbours get re-pushed with updated areas. Lazy
deletion via a removed[] flag (already present) handles stale heap
entries:

import heapq
heap = [(areas[i], i) for i in range(1, n - 1)]
heapq.heapify(heap)
while remaining > 2:
    while heap:
        area, idx = heapq.heappop(heap)
        if not removed[idx] and area == areas[idx]:
            break
    else:
        break
    if area >= tolerance:
        break
    removed[idx] = True
    remaining -= 1
    # Recompute neighbour areas and push.
    ...

Numba supports heapq only with the numba-scipy extension, so the
heap version may need to drop @ngjit or move the simplification to
plain Python with a numba inner kernel for the area math. Worth
checking whether the constant-factor loss outweighs the asymptotic win
on typical inputs (rings with <100 vertices).

Severity

MEDIUM. Only hits the visvalingam-whyatt code path
(simplify_method="visvalingam-whyatt"), not the default
douglas-peucker. Surfaces on long rings, e.g. the exterior of a large
connected region after polygonize.

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePR touches performance-sensitive code

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions