sparse_matmul implementation, softmax stability, layer configs #155

jackd · 2020-01-21T05:43:14Z

This PR is mostly focused on adding a 'sparse_matmul' implementation of feature_steered_convolution which is faster than the base version ('gather_sum') in certain circumstances.

Differences can be seen in the benchmarking script.

cd benchmarks
python graph_convolution_benchmark.py --backward --sort
# Wall time (ms): 1173.0027198791504
# Memory (Mb):  5350.4373207092285
python graph_convolution_benchmark.py --backward --sort --sparse
# Wall time (ms): 885.279655456543
# Memory (Mb):  4426.739643096924

Benefits are similar though reduced when using --jit option.

Note these benefits are not across-the-board, hence the choice of implementation is left to the user. For example, smaller networks are faster using gather_sum. Verify with --num_layers=2 or similar.

Other minor changes:

numerically stable softmax implementation (this is faster than tf.nn.softmax when using JIT)
documentation: feature_steered_convolution can take non-square neighborhood tensors. Updated documentation to reflect this.
layers: added get_config implementations and tf.keras.*.get calls in constructors, allowing Layer.from_config to work as intended.

jackd · 2020-01-21T05:44:22Z

@julienvalentin @amakadia this supercedes PR 101.

amakadia · 2020-02-13T16:49:09Z

Thanks for making this change. I'm testing this now, and should be able to respond soon.

amakadia · 2020-02-20T16:56:38Z

Hi Jack,
This change looks great. Can you share some benchmark results from your trials... I'm specifically interested in timings and memory allocations on a GPU. I just want to check against some of the numbers I'm seeing on my end.
Thanks!

avneesh-sud · 2020-02-27T17:58:20Z

Hi Jack,

Could you also test this change on the mesh-segmentation model shared in the mesh-segmentation-demo.ipynb colab? We want to verify consistency with the existing implementation as a baseline.

jackd · 2020-02-28T00:25:02Z

@avneesh-sud sorry about the delay on this - ECCV deadline is taking priority right now.

avneesh-sud · 2020-04-16T18:14:40Z

@avneesh-sud sorry about the delay on this - ECCV deadline is taking priority right now.

@jackd - hope you are safe and recovered from ECCV deadline. Checking in if you have bandwidth to benchmark the changes and test with the segmentation model now?

jackd · 2020-04-19T01:41:34Z

@avneesh-sud - sorry about the delay, rolled straight into a new role after ECCV. I've had another look and... things aren't as straight forward as I remember. I'm running experiments on my laptop now with a 1050-Ti (previous results were on a desktop / 1070), but having done some more digging on experiments with full forward/backwards pass and sorted inputs, I'm observing:

sparse implementation is much better than standard implementation without jit - faster and consumes less memory
sparse impl. doesn't see much improvement with jit
standard impl. gets much better with jit
standard impl. is slightly faster than sparse impl. with jit and advertises significantly lower memory usage
with jit, standard imp. still OOMs earlier than sparse impl, despite advertising significantly lower memory usage for runs without OOMing.

Standard implementation OOMs with default num_vertices, so maybe I'm just not hitting the cross-over point I advertised above?

I've never really trusted the memory usage values that come out of jitted code, and this experiment hasn't given me any more faith. I don't know of any other profiling tools for this - happy to take suggestions though.

Will hopefully have a chance to play with the model tomorrow :)

jackd · 2020-04-20T00:26:24Z

For the models from the notebook, the sparse implementation (proposed in this PR) takes ~170ms per step, vs 215ms for the original implementation on my laptop with 1050Ti. Not much observed difference in memory usage based on random polling of nvidia-smi output.

In terms of model accuracy, there are tests in place that show the results are the same but I wanted to do up a keras model anyway so I did up this gist. I got a bit carried away changing stuff and it's entirely possible I stuffed something up, but overall loss numbers look roughly equivalent to the notebook file. Unsurprisingly, the results using different layer implementations are very similar. Somewhat surprising was that my sparse keras implementation in tf 2.1 was ~10% slower than the notebook version and tf 1.15, where as my keras implementation using the default layer implementations ~30% slower than the notebook equivalent. Again, not really an apples-to-apples comparison because I changed a fair bit (though I'd hoped my changes would have made things faster if anything). Maybe I screwed up something...

.

sparse_matmul implementation, softmax stability, layer configs

42ee754

googlebot added the cla: yes label Jan 21, 2020

jackd requested a review from amakadia January 21, 2020 07:46

julienvalentin requested a review from avneesh-sud January 30, 2020 12:51

julienvalentin assigned avneesh-sud Apr 23, 2020

jackd mentioned this pull request Jun 1, 2020

RFC: CSRSparseMatrix tensorflow/community#246

Merged

jackd closed this Aug 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sparse_matmul implementation, softmax stability, layer configs #155

sparse_matmul implementation, softmax stability, layer configs #155

Uh oh!

jackd commented Jan 21, 2020

Uh oh!

jackd commented Jan 21, 2020

Uh oh!

amakadia commented Feb 13, 2020

Uh oh!

amakadia commented Feb 20, 2020

Uh oh!

avneesh-sud commented Feb 27, 2020

Uh oh!

jackd commented Feb 28, 2020

Uh oh!

avneesh-sud commented Apr 16, 2020

Uh oh!

jackd commented Apr 19, 2020

Uh oh!

jackd commented Apr 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sparse_matmul implementation, softmax stability, layer configs #155

sparse_matmul implementation, softmax stability, layer configs #155

Uh oh!

Conversation

jackd commented Jan 21, 2020

Uh oh!

jackd commented Jan 21, 2020

Uh oh!

amakadia commented Feb 13, 2020

Uh oh!

amakadia commented Feb 20, 2020

Uh oh!

avneesh-sud commented Feb 27, 2020

Uh oh!

jackd commented Feb 28, 2020

Uh oh!

avneesh-sud commented Apr 16, 2020

Uh oh!

jackd commented Apr 19, 2020

Uh oh!

jackd commented Apr 20, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants