optimized partition_sums_2d
#73
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The current
partition_sums_2d, from what I can tell, is almost always slower and less memory efficient thantf.math.unsorted_segment_sum(See example below which exhibits 2x speed-up and 10x memory reduction). This PR:tf.math.unsorted_segment_sumpartition_sums_2dto makemax/weightedreduction code branches more similar (segment_maxvssegment_sumrespectively).tf.segment_max(and added tests demonstrating possible errors).dimension.valuethat was annoying me (I've taken to usingtf.enable_v2_tensorshapewhich wasn't compatible with this code) (Maybe this shouldn't be a part of this PR... happy to take it out).An
orderkwarg has been added to certain methods for potential optimization, though it seems to make minimal difference to performance from what I can tell. Strictly speaking this is a breaking change (adds a non-final kwarg - could put it afternameto make it less breaking, but there seems to be a convention thatnamealways goes last).Changes that I haven't made to keep the PR minimal/mostly breaking but I'll float anyway:
reduction in ('max', 'weighted')seems clunky. Could the weighting be considered separately from the reduction, and the reduction just be one of('max', 'sum')? Better yet, could reduction be one of(tf.math.unsorted_segment_sum, tf.math.segment_sum, tf.unsorted_segment_max, tf.segment_max)and do away with the need tosortedkwarg introduced in this PR?partition_sums_2dis just a thin wrapper, should it be deprecated?Benchmark demonstrating performance improvement:
Output
-------------------------- original entry { name: "TensorFlowBenchmark.run_op_benchmark" iters: 10 wall_time: 0.0004891157150268555 extras { key: "allocator_maximum_num_bytes_GPU_0_bfc" value { double_value: 4600016.0 } } } -------------------------- new entry { name: "TensorFlowBenchmark.run_op_benchmark" iters: 10 wall_time: 0.0002224445343017578 extras { key: "allocator_maximum_num_bytes_GPU_0_bfc" value { double_value: 400000.0 } } }