From https://groups.google.com/a/tensorflow.org/forum/#!topic/tfjs/_IDVt3wQFXA: What are best practices in timing operations that involve data transfer between CPU/GPU to identify bottlenecks?