[Intel MKL] Use Shard function instead of Eigen device to parallelize Adam kernel. #26424

This could reduce the memory access and get good cache locality for CPU. modified: - tensorflow/core/kernels/training_ops.cc - tensorflow/core/kernels/training_ops.h - tensorflow/core/kernels/training_ops_gpu.cu.cc Signed-off-by: Lu Teng teng.lu@intel.com

To get better cache locality, use Shard instead of Eigen expression.

Also added a benchmark to test Adam performance.

Commits on Mar 15, 2019

New small benchmark and excatly var name.

Zantares committed Mar 15, 2019

Configuration menu

View commit details

Copy full SHA for e4dae32

Browse repository at this point

Copy the full SHA

e4dae32 View commit details

Browse the repository at this point in the history

Commits on Mar 19, 2019

Fix shard cost and var name.

Zantares committed Mar 19, 2019

Configuration menu

View commit details

Copy full SHA for 2160c84

Browse repository at this point

Copy the full SHA

2160c84 View commit details

Browse the repository at this point in the history

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Intel MKL] Use Shard function instead of Eigen device to parallelize Adam kernel. #26424

[Intel MKL] Use Shard function instead of Eigen device to parallelize Adam kernel. #26424

Commits on Mar 7, 2019

Commits on Mar 13, 2019

Commits on Mar 14, 2019

Commits on Mar 15, 2019

Commits on Mar 19, 2019