perf(autograd): optimize grey_dilation with striding #2589

yaugenst-flex · 2025-06-20T11:39:51Z

The previous implementation of grey_dilation was based on convolution, which was slow for both the forward and backward passes.

This PR replaces it with a high-performance implementation that uses NumPy's sliding_window_view to create sliding window views of the input array. I also wrote a custom VJP that uses the same striding technique to make the backward pass faster too.

I also simplified the implementation of grey_erosion so that grey_dilation is now the only function that does the heavy lifting.

Benchmarks show speedups of 10-100x depending on the array and kernel size.

This should make these ops much more usable in topopt @groberts-flex

Greptile Summary

Significant performance optimization of the grey_dilation morphological operation by replacing convolution-based implementation with NumPy's sliding_window_view for strided array operations.

Replaced convolution-based implementation with strided array approach in tidy3d/plugins/autograd/functions.py, achieving 10-100x speedup
Added custom VJP (vector-Jacobian product) for efficient backpropagation using the same striding technique
Simplified grey_erosion by expressing it through duality with grey_dilation
Updated morphology test cases to use first-order gradients and include kernel structure testing
Added comprehensive benchmarks showing performance improvements scaling with array and kernel sizes

greptile-apps

LGTM

_{3 files reviewed, no comments}
_{Edit PR Review Bot Settings | Greptile}

github-actions · 2025-06-20T12:05:08Z

Diff Coverage

Diff: origin/develop...HEAD, staged and unstaged changes

tidy3d/plugins/autograd/functions.py (98.8%): Missing lines 67

Summary

Total: 83 lines
Missing: 1 line
Coverage: 98%

tidy3d/plugins/autograd/functions.py

  63         The indices for padding along the axis.
  64     """
  65     total_pad = sum(pad_width)
  66     if n == 0:
! 67         return numpy_module.zeros(total_pad, dtype=int)
  68 
  69     idx = numpy_module.arange(-pad_width[0], n + pad_width[1])
  70 
  71     if mode == "constant":

tidy3d/plugins/autograd/functions.py

groberts-flex

thanks for this implementation, the speed up looks awesome especially for a function that will be in a lot of robust optimizations!

left some comments/questions, some just for my own understanding!

The previous implementation of `grey_dilation` was based on convolution, which was slow for both the forward and backward passes. This commit replaces it with a high-performance implementation that uses NumPy's `as_strided` to create sliding window views of the input array. This avoids redundant computations and memory allocations, leading to significant speedups. The VJP (gradient) for the primitive is also updated to use the same striding technique, ensuring the backward pass is also much faster. Benchmarks show speedups of 10-100x depending on the array and kernel size.

groberts-flex

thanks @yaugenst-flex this is great! I understand now much better the multiplicity/multiple maximum part reading through the updated comment. thanks for the clarifications and changes.

yaugenst-flex requested a review from groberts-flex June 20, 2025 11:39

yaugenst-flex self-assigned this Jun 20, 2025

greptile-apps bot reviewed Jun 20, 2025

View reviewed changes

yaugenst-flex force-pushed the yaugenst-flex/faster-morphology branch 2 times, most recently from 266d6a0 to be0b9b0 Compare June 20, 2025 15:08