Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Intel MKL] Use Shard function instead of Eigen device to parallelize Adam kernel. #26424

Commits on Mar 7, 2019

  1. Use Shard function instead of Eigen device to parallelize Adam kernel.

    This could reduce the memory access and get good cache locality for CPU.
    
    modified:
    - tensorflow/core/kernels/training_ops.cc
    - tensorflow/core/kernels/training_ops.h
    - tensorflow/core/kernels/training_ops_gpu.cu.cc
    
    Signed-off-by: Lu Teng teng.lu@intel.com
    Zantares committed Mar 7, 2019
    Configuration menu
    Copy the full SHA
    777498b View commit details
    Browse the repository at this point in the history

Commits on Mar 13, 2019

  1. Add comment for Shard function

    To get better cache locality, use Shard instead of Eigen expression.
    Zantares committed Mar 13, 2019
    Configuration menu
    Copy the full SHA
    52f2440 View commit details
    Browse the repository at this point in the history

Commits on Mar 14, 2019

  1. Refine code with simple Tensor vectorization form.

    Also added a benchmark to test Adam performance.
    Zantares committed Mar 14, 2019
    Configuration menu
    Copy the full SHA
    06dd621 View commit details
    Browse the repository at this point in the history

Commits on Mar 15, 2019

  1. Configuration menu
    Copy the full SHA
    e4dae32 View commit details
    Browse the repository at this point in the history

Commits on Mar 19, 2019

  1. Fix shard cost and var name.

    Zantares committed Mar 19, 2019
    Configuration menu
    Copy the full SHA
    2160c84 View commit details
    Browse the repository at this point in the history