Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenCL] Implementation improvements #9117

Merged
merged 70 commits into from May 31, 2017

Conversation

lukeiwanski
Copy link

@lukeiwanski lukeiwanski commented Apr 10, 2017

OpenCL implementation improvements (#22)

  • [Build] Use gcc/g++ as a host compiler to avoid Protobufs are into multiple shared libraries loaded from python #8394 (Support cuda 7.5 and cudnn 7.0 #54)
  • [Eigen] Version bump
  • [OpenCL] Implementation improvements
    • Register SYCL implementations for random ops
      • Simplify by using Eigen math functions
    • Registers Scatter and ScatterNd Ops for SYCL
    • Registers Stack op for SYCL
    • Fixes No sycl buffer found error for debug ops
    • Registers MatMul and Transpose Ops to SYCL device for double
    • Extends analyzer_cli_test.py test to cover SYCL
    • Fixes Transpose Op for double when on SYCL
    • Bumps Eigen version to fix double precision issue on SYCL
    • Extends SessionDebugTestBase to cover SYCL
    • Bumps Eigen Version
    • Refactors Ops registration
    • Introduces workaround for Const Op related to the difference between
      CUDA which uses pointers and OpenCL that uses buffers/accessors
    • Extends memory types to cover DEVICE_SYCL as well
    • Introduces GetSYCLDevice() method that returns list of supported devices
      with GPU device having the highest priority ( doesn't include blacklisted devices )
    • ::internal::Transpose -> tensorflow::internal::Transpose in order to
      avoid compilation reported error
    • Adds sycl_runtime to bazels ARRAY_DEPS
    • Replicates TF_CALL_GPU_PROXY_TYPES for SYCL
    • Fixes an issue caused by switch to aligned allocator for sycl buffer (Can't install on ubuntu 12.04.5 LTS #53)
    • Fix testSimple and testConst in stack_op_test (JVM, .NET Language Support #3)
    • RandomGamma has no GPU friendly implementation (virtualenv python 2.7.6 import tensorflow. TypeError: __init__() got an unexpected keyword argument 'syntax' #57)
    • Register batch normalization kernels for OpenCL (ImportError: No module named core.framework.graph_pb2 #61)
    • Compatibility fixes for TensorFlow 1.1.0-rc1
    • Fixes Scatter Op
    • Implements BatchMatmul Op for SYCL
    • Lowercase the device name when GPU or SYCL returned
    • kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point
      representation on the device

@tensorflow-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@lukeiwanski
Copy link
Author

@ville-k, @Zakor94 can you confirm?

@lukeiwanski lukeiwanski changed the title [OpenCL] Implementation improvements (#22) [OpenCL] Implementation improvements Apr 10, 2017
@drpngx
Copy link
Contributor

drpngx commented Apr 10, 2017

Sometimes googlebot gets confused, could you git pull --rebase and push again?

ville-k and others added 15 commits April 11, 2017 00:54
* Avoid functions that might not be defined on SYCL device

* Simplify by using Eigen math functions
* Registers Scatter and ScatterNd Ops for SYCL

* Registers Stack op for SYCL

* Fixes No sycl buffer found error for debug ops

* Registers MatMul and Transpose Ops to SYCL device for double

* Extends analyzer_cli_test.py test to cover SYCL

* Fixes Transpose Op for double when on SYCL

* Bumps Eigen version to fix double precision issue on SYCL

* Extends SessionDebugTestBase to cover SYCL
 - Bumps Eigen Version
 - Refactors Ops registration
 - Introduces workaround for Const Op related to the difference between
   CUDA which uses pointers and OpenCL that uses buffers/accessors
 - Extends memory types to cover DEVICE_SYCL as well
 - Introduces  GetSYCLDevice() method that returns list of supported devices
   with GPU device having the highest priority ( doesn't include blacklisted devices )
 - ::internal::Transpose -> tensorflow::internal::Transpose in order to
   avoid compilation reported error
 - re-introduces fix for bugged string replacement causing a lot of compilation
   warnings -c -> --include
 - Adds sycl_runtime to bazels ARRAY_DEPS
 - Replicates TF_CALL_GPU_PROXY_TYPES for SYCL
* Fix testSimple and testConst in stack_op_test

* Create a specialisation of DoParallelConcatUpdate for SyclDevice and
register it

* Guard all code in TENSORFLOW_USE_SYCL

* Do not use sycl device for int32

* Registration of the Sycl version is now looking like the one for the GPU

* Remove added empty line
… to floating point representation on the device
@ville-k
Copy link
Contributor

ville-k commented Apr 11, 2017

@lukeiwanski @googlebot ok with me.

@Zakor94
Copy link

Zakor94 commented Apr 11, 2017

@lukeiwanski @googlebot yes ok with me too.

@yifeif yifeif added cla: yes and removed cla: no labels Apr 11, 2017
@tensorflow-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@rmlarsen
Copy link
Member

@benoitsteiner could you please take another look?

@rmlarsen rmlarsen added stat:awaiting response Status - Awaiting response from author stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels May 16, 2017
  Fixes bazels missing undeclared inclusions that appeared after
  merge with TensorFlow upstream
@rmlarsen
Copy link
Member

@tensorflow-jenkins test this please

@lukeiwanski
Copy link
Author

lukeiwanski commented May 17, 2017

The MacOs CPU Tests seems to be failing due to a missing dependency in gif_archive..

ERROR: /private/var/tmp/_bazel_jenkins/13e370a18c169b19baeafefb05212b85/external/gif_archive/BUILD.bazel:8:1: undeclared inclusion(s) in rule '@gif_archive//:gif':
this rule is missing dependency declarations for the following files included by 'external/gif_archive/lib/openbsd-reallocarray.c':

Is that caused by our changeset or is it present in the upstream? I cannot see how we could cause it.

@rmlarsen
Copy link
Member

@lukeiwanski no, the build error on MacOS is unrelated to your changes.

configure Outdated
@@ -683,7 +683,7 @@ if [ "$TF_NEED_OPENCL" == "1" ]; then
while true; do
fromuser=""
if [ -z "$HOST_CXX_COMPILER" ]; then
default_cxx_host_compiler=$(which clang++-3.6 || true)
default_cxx_host_compiler=$(which g++-4.8 || true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to pin this to a specific gcc/g++ release?
Why not just "which g++"?
This can make configuration more difficult on different machines

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We locked to GCC-4.8 as it was default installation on Ubuntu 14.04.
This has been changed in 50197b9 and d777320
However, on ubuntu 16.04 user will have to add "-D_GLIBCXX_USE_CXX11_ABI=0" manually to the command line if there is custom compiler installed on the machine.

configure Outdated
@@ -707,7 +707,7 @@ done
while true; do
fromuser=""
if [ -z "$HOST_C_COMPILER" ]; then
default_c_host_compiler=$(which clang-3.6 || true)
default_c_host_compiler=$(which gcc-4.8 || true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto above

@@ -92,7 +92,7 @@ def testLinearlySeparableBinaryDataNoKernels(self):
# Since the data is linearly separable, the classifier should have small
# loss and perfect accuracy.
self.assertLess(metrics['loss'], 0.1)
self.assertEqual(metrics['accuracy'], 1.0)
self.assertAlmostEqual(metrics['accuracy'], 1.0, 7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment above, I am worried about almost equal here. I know 7 digits is very very precise, but still we are making the test looser.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for that was the fact that sometimes OpenCL represent this value as 0.99999996
Reverting in 1a60452

@gunan
Copy link
Contributor

gunan commented May 26, 2017

Jenkins, test this please.

@benoitsteiner
Copy link
Contributor

Jenkins, test this please.

@benoitsteiner benoitsteiner merged commit fe589d9 into tensorflow:master May 31, 2017
@lukeiwanski lukeiwanski deleted the opencl_improvements branch August 3, 2017 10:40
lngart pushed a commit to lngart/GitSubSep that referenced this pull request Oct 5, 2021
* OpenCL Improvements

* Registers Scatter and ScatterNd Ops for SYCL

* Registers Stack op for SYCL

* Fixes No sycl buffer found error for debug ops

* Registers MatMul and Transpose Ops to SYCL device for double

* Extends analyzer_cli_test.py test to cover SYCL

* Fixes Transpose Op for double when on SYCL

* Bumps Eigen version to fix double precision issue on SYCL

* Extends SessionDebugTestBase to cover SYCL

* Register SYCL implementations for random ops

* Avoid functions that might not be defined on SYCL device (#51)

* Avoid functions that might not be defined on SYCL device

* Simplify by using Eigen math functions

* OpenCL improvements

 - Bumps Eigen Version
 - Refactors Ops registration
 - Introduces workaround for Const Op related to the difference between
   CUDA which uses pointers and OpenCL that uses buffers/accessors
 - Extends memory types to cover DEVICE_SYCL as well
 - Introduces  GetSYCLDevice() method that returns list of supported devices
   with GPU device having the highest priority ( doesn't include blacklisted devices )
 - ::internal::Transpose -> tensorflow::internal::Transpose in order to
   avoid compilation reported error
 - re-introduces fix for bugged string replacement causing a lot of compilation
   warnings -c -> --include
 - Adds sycl_runtime to bazels ARRAY_DEPS
 - Replicates TF_CALL_GPU_PROXY_TYPES for SYCL

* [OpenCL] Fixes an issue caused by switch to aligned allocator for sycl buffer (#53)

* [Build] Use gcc/g++ as a host compiler to avoid tensorflow/tensorflow#8394 (#54)

* [OpenCL] Fixes Scatter Op

* Fix testSimple and testConst in stack_op_test (#3)

* Fix testSimple and testConst in stack_op_test

* Create a specialisation of DoParallelConcatUpdate for SyclDevice and
register it

* Guard all code in TENSORFLOW_USE_SYCL

* Do not use sycl device for int32

* Registration of the Sycl version is now looking like the one for the GPU

* Remove added empty line

* Register batch normalization kernels for OpenCL (#61)

* [OpenCL] RandomGamma has no GPU friendly implementation (#57)

* [OpenCL] Compatibility fixes for TensorFlow 1.1.0-rc1

* [OpenCL] Implements BatchMatmul Op for SYCL

* Lowercase the device name when GPU or SYCL returned

* [OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device

* [Eigen] Version bump

* GPU device name string manipulation is not needed anymore

* [OpenCL] Adds SYCL to device backwards compatibility

* [OpenCL] Extends core_rnn_test.py to run for SYCL device

* [OpenCL] Minor optimizations for build script

* [OpenCL] Enables skip folder list in build script

* [OpenCL] Fixes ApplyAdamOp for Sycl device

* [OpenCL] SYCL device improvements

* [OpenCL] Fixes debug_ops's SEGFAULT for SYCL device

* [Build] Adds hexagon to skipped folders list

* [OpenCL] Removes EnterLameDuckMode from SYCL device and allocator

* [OpenCL] Registers Unique Op for SYCL device

* [OpenCL][Temporary] Disables tests for SYCL target due to features not being implemented yet

  Tests affected:
    - tensorflow/contrib/memory_stats/python/kernel_tests/memory_stats_ops_test.py
    - tensorflow/contrib/rnn/python/kernel_tests/core_rnn_test.py
    - tensorflow/python/kernel_tests/conv_ops_test.py
    - tensorflow/python/kernel_tests/depthwise_conv_op_test.py
    - tensorflow/python/kernel_tests/pooling_ops_3d_test.py
    - tensorflow/python/kernel_tests/pooling_ops_test.py
    - tensorflow/python/kernel_tests/scatter_nd_ops_test.py
    - tensorflow/python/training/adam_test.py
    - tensorflow/python/training/localhost_cluster_performance_test.py
    - tensorflow/python/training/training_ops_test.py

* [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline

  Tests affected:
    - tensorflow/python/debug/cli/analyzer_cli_test.py
    - tensorflow/python/debug/lib/session_debug_testlib.py
    - tensorflow/python/debug/lib/stepper_test.py
    - tensorflow/python/kernel_tests/unstack_op_test.py
    - tensorflow/python/ops/image_ops_test.py

* [OpenCL] Take options.config.device_count() into consideration

* [OpenCL] Fixes compilation warning

* [OpenCL] device:SYCL:0 -> sycl:0

* [OpenCL] Removes unwanted flags in building script

Removes flags given to computecpp that enable SIMD instructions
Removes duplicate flags

* bool -> const bool

* [OpenCL] sycl in test_util.gpu_device_name() -> is_sycl_enabled()

* [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline

  Test affected:
    - tensorflow/contrib/stateless/python/kernel_tests/stateless_random_ops_test.py

* Imports test_util from tensorflow.python.framework

* [OpenCL] Fixes formatting in Python code

* [OpenCL] Extends session_test.py to cover SYCL device

* [OpenCL] Cleans singleton class

* [OpenCL] Keeping CUDA happy

* [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline

  Test affected:
   - tensorflow/contrib/rnn/python/kernel_tests/core_rnn_cell_test.py
   - tensorflow/contrib/seq2seq/python/kernel_tests/beam_search_ops_test.py

* Added support for building with SYCL on ARM.

* Acts on the review feedback from:
 - tensorflow/tensorflow#9117 (comment)
 - tensorflow/tensorflow#9117 (comment)

* [OpenCL] Fixes scatter_nd_op_test

* Fixes auto-merge mistake

* [OpenCL] struct SyclDevice -> class SyclDevice

* Revert "[OpenCL] struct SyclDevice -> class SyclDevice"

This reverts commit addd43348c374a5379f67bb1e5ad084715722fc2.

* [OpenCL] Reverting refactoring commit.

  As requested in the review tensorflow/tensorflow#9117 (comment)
  This change set will be re-introduced in smaller chunks.

* Revert "[OpenCL] device:SYCL:0 -> sycl:0"

This reverts commit cf16e60340b62d16c3764d71b716fe03d35f87a9.

* Revert "[OpenCL] Adds SYCL to device backwards compatibility"

This reverts commit b8401b5164199b7a169be1c1d8dea5001195c390.

* Acts on the feedback from tensorflow/tensorflow#9117 (comment)

* control_flow_ops_py_test.py expects device name to be lower cased

* Acts on the feedback from tensorflow/tensorflow#9117 (comment)

* Removes debug print

* Removes not needed partial specialisation

* [OpenCL] Registers ScatterNdFunctor for SYCL device

* [OpenCL] Make it compile

* [OpenCL] Follow gpu_device changes

* [OpenCL] Adds cxx_builtin_include_directory for python lib

  Fixes bazels missing undeclared inclusions that appeared after
  merge with TensorFlow upstream

* [OpenCL] Fixes Constant Op

* [OpenCL] gXX-4.8 -> gXX

* [OpenCL] Removes -D_GLIBCXX_USE_CXX11_ABI=0 as it breaks default compiler setup for Ubuntu 16.04

* Revert "[OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device"

This reverts commit 06c50c0a485f40c30a436f02c3fa7794e370c49d.

* [OpenCL] CPU allocator is a singleton we should not delete it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: no stat:awaiting tensorflower Status - Awaiting response from tensorflower
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet