feature_steered_convolution impls and benchmarks #101
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is the result of some work I've been working on variations on feature steered convolutions. Each variation has it's own strengths and weaknesses, so I'm unsure if graphics would like to accept any/all of the options, or how to package them.
Algebraic manipulations
The idea behind the changes is to avoid storing temporary feature values at each edge (caused by gather -> reduce) by using
tf.sparse.sparse_dense_matmul. In order to do this, we perform the following manipulations to the m terms from the paper (m == win code).where
weighted_neighbors_ij = neighbors_ij / sum_m (q_m(x_i, x_j)is the original neighborhood weighting value divided by the softmax normalization factor.This inner summation can then be implemented using
tf.sparse.sparse_dense_matmul,All implementations are algebraicly equivalent and yield similar results (to within 1e-5).
Main Changes
feature_steered_convolutiontofeature_steered_convolution_v1and newfeature_steered_convolutionwhich redirects to this or other implementations.feature_steered_convolution_v2which is based on sparse multiplication above. This implementation computes allmterms in a single vectorized block which is fastest, though it requires a feature tensor of shape[V, D, W]before the final dimension is reduced, so requires a lot of memory. The default is, as far as I can tell, always optimal, so this could easily be dropped (including to make testing/verifying easier, but happy to remove if approved).feature_steered_convolution_v3which addresses memory issues inv2by computing the last dimension of the conceptual[V, D, W]tensor sequentially.transform_data_firstoption to all implementations. This allows transformingx_flatviavar_wbefore or after other multiplications (before is more efficient if the number of features is decreasing). This is equivalent to taking advantage of associativity of matrix multiplication in the second equation above.memory_efficientoption tov1andv3that usesfoldlfor sequential additions overWdimension.v1similar to this PR based ontf.segment_sumandtf.unsorted_segment_sum.Not included
Tests. It would be straight-forward enough to add a version of each implementation to the existing test-suite, but how thorough do we want to be? Basic checks in
feature_steered_conv_benchmark.pyimplementations give values consistent with original implementation.Benchmark summary
v2is consistently fastest, though uses up to 70% more memory for very sparse neighborhoods. Memory efficiency scales well with less sparse neighborhoods, but poorly withm.m- at a small performance penalty (~10%).sortedsegment sum is always faster and generally consumes less memory (or roughly the same) compared to custom method implemented in this package.Sample benchmark results
Name keys:
p2d: uses partition_sums_2d implementation
sorted: uses tf.math.segment_sum
unsorted: uses tf.math.unsorted_segment_sum
bad: uses non-default
transform_data_firstargumentmem: uses memory efficient implementation
Single convolution
Lower sparsity
Single convolution
Higher sparsity
Demo model
filters = 8
Demo model
filters = 64