New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenCL] Implementation improvements #9117
[OpenCL] Implementation improvements #9117
Conversation
|
Can one of the admins verify this patch? |
|
So there's good news and bad news. Note to project maintainer: This is a terminal state, meaning the |
|
Sometimes googlebot gets confused, could you |
* Avoid functions that might not be defined on SYCL device * Simplify by using Eigen math functions
* Registers Scatter and ScatterNd Ops for SYCL * Registers Stack op for SYCL * Fixes No sycl buffer found error for debug ops * Registers MatMul and Transpose Ops to SYCL device for double * Extends analyzer_cli_test.py test to cover SYCL * Fixes Transpose Op for double when on SYCL * Bumps Eigen version to fix double precision issue on SYCL * Extends SessionDebugTestBase to cover SYCL
- Bumps Eigen Version - Refactors Ops registration - Introduces workaround for Const Op related to the difference between CUDA which uses pointers and OpenCL that uses buffers/accessors - Extends memory types to cover DEVICE_SYCL as well - Introduces GetSYCLDevice() method that returns list of supported devices with GPU device having the highest priority ( doesn't include blacklisted devices ) - ::internal::Transpose -> tensorflow::internal::Transpose in order to avoid compilation reported error - re-introduces fix for bugged string replacement causing a lot of compilation warnings -c -> --include - Adds sycl_runtime to bazels ARRAY_DEPS - Replicates TF_CALL_GPU_PROXY_TYPES for SYCL
* Fix testSimple and testConst in stack_op_test * Create a specialisation of DoParallelConcatUpdate for SyclDevice and register it * Guard all code in TENSORFLOW_USE_SYCL * Do not use sycl device for int32 * Registration of the Sycl version is now looking like the one for the GPU * Remove added empty line
… to floating point representation on the device
d00cb70
to
e093efa
Compare
|
@lukeiwanski @googlebot ok with me. |
|
@lukeiwanski @googlebot yes ok with me too. |
53d50fe
to
7f24669
Compare
|
Can one of the admins verify this patch? |
|
@benoitsteiner could you please take another look? |
Fixes bazels missing undeclared inclusions that appeared after merge with TensorFlow upstream
|
@tensorflow-jenkins test this please |
|
The MacOs CPU Tests seems to be failing due to a missing dependency in gif_archive.. Is that caused by our changeset or is it present in the upstream? I cannot see how we could cause it. |
|
@lukeiwanski no, the build error on MacOS is unrelated to your changes. |
configure
Outdated
| @@ -683,7 +683,7 @@ if [ "$TF_NEED_OPENCL" == "1" ]; then | |||
| while true; do | |||
| fromuser="" | |||
| if [ -z "$HOST_CXX_COMPILER" ]; then | |||
| default_cxx_host_compiler=$(which clang++-3.6 || true) | |||
| default_cxx_host_compiler=$(which g++-4.8 || true) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to pin this to a specific gcc/g++ release?
Why not just "which g++"?
This can make configuration more difficult on different machines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
configure
Outdated
| @@ -707,7 +707,7 @@ done | |||
| while true; do | |||
| fromuser="" | |||
| if [ -z "$HOST_C_COMPILER" ]; then | |||
| default_c_host_compiler=$(which clang-3.6 || true) | |||
| default_c_host_compiler=$(which gcc-4.8 || true) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto above
| @@ -92,7 +92,7 @@ def testLinearlySeparableBinaryDataNoKernels(self): | |||
| # Since the data is linearly separable, the classifier should have small | |||
| # loss and perfect accuracy. | |||
| self.assertLess(metrics['loss'], 0.1) | |||
| self.assertEqual(metrics['accuracy'], 1.0) | |||
| self.assertAlmostEqual(metrics['accuracy'], 1.0, 7) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comment above, I am worried about almost equal here. I know 7 digits is very very precise, but still we are making the test looser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for that was the fact that sometimes OpenCL represent this value as 0.99999996
Reverting in 1a60452
|
Jenkins, test this please. |
…iler setup for Ubuntu 16.04
…qual due to floating point representation on the device" This reverts commit 06c50c0.
|
Jenkins, test this please. |
* OpenCL Improvements * Registers Scatter and ScatterNd Ops for SYCL * Registers Stack op for SYCL * Fixes No sycl buffer found error for debug ops * Registers MatMul and Transpose Ops to SYCL device for double * Extends analyzer_cli_test.py test to cover SYCL * Fixes Transpose Op for double when on SYCL * Bumps Eigen version to fix double precision issue on SYCL * Extends SessionDebugTestBase to cover SYCL * Register SYCL implementations for random ops * Avoid functions that might not be defined on SYCL device (#51) * Avoid functions that might not be defined on SYCL device * Simplify by using Eigen math functions * OpenCL improvements - Bumps Eigen Version - Refactors Ops registration - Introduces workaround for Const Op related to the difference between CUDA which uses pointers and OpenCL that uses buffers/accessors - Extends memory types to cover DEVICE_SYCL as well - Introduces GetSYCLDevice() method that returns list of supported devices with GPU device having the highest priority ( doesn't include blacklisted devices ) - ::internal::Transpose -> tensorflow::internal::Transpose in order to avoid compilation reported error - re-introduces fix for bugged string replacement causing a lot of compilation warnings -c -> --include - Adds sycl_runtime to bazels ARRAY_DEPS - Replicates TF_CALL_GPU_PROXY_TYPES for SYCL * [OpenCL] Fixes an issue caused by switch to aligned allocator for sycl buffer (#53) * [Build] Use gcc/g++ as a host compiler to avoid tensorflow/tensorflow#8394 (#54) * [OpenCL] Fixes Scatter Op * Fix testSimple and testConst in stack_op_test (#3) * Fix testSimple and testConst in stack_op_test * Create a specialisation of DoParallelConcatUpdate for SyclDevice and register it * Guard all code in TENSORFLOW_USE_SYCL * Do not use sycl device for int32 * Registration of the Sycl version is now looking like the one for the GPU * Remove added empty line * Register batch normalization kernels for OpenCL (#61) * [OpenCL] RandomGamma has no GPU friendly implementation (#57) * [OpenCL] Compatibility fixes for TensorFlow 1.1.0-rc1 * [OpenCL] Implements BatchMatmul Op for SYCL * Lowercase the device name when GPU or SYCL returned * [OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device * [Eigen] Version bump * GPU device name string manipulation is not needed anymore * [OpenCL] Adds SYCL to device backwards compatibility * [OpenCL] Extends core_rnn_test.py to run for SYCL device * [OpenCL] Minor optimizations for build script * [OpenCL] Enables skip folder list in build script * [OpenCL] Fixes ApplyAdamOp for Sycl device * [OpenCL] SYCL device improvements * [OpenCL] Fixes debug_ops's SEGFAULT for SYCL device * [Build] Adds hexagon to skipped folders list * [OpenCL] Removes EnterLameDuckMode from SYCL device and allocator * [OpenCL] Registers Unique Op for SYCL device * [OpenCL][Temporary] Disables tests for SYCL target due to features not being implemented yet Tests affected: - tensorflow/contrib/memory_stats/python/kernel_tests/memory_stats_ops_test.py - tensorflow/contrib/rnn/python/kernel_tests/core_rnn_test.py - tensorflow/python/kernel_tests/conv_ops_test.py - tensorflow/python/kernel_tests/depthwise_conv_op_test.py - tensorflow/python/kernel_tests/pooling_ops_3d_test.py - tensorflow/python/kernel_tests/pooling_ops_test.py - tensorflow/python/kernel_tests/scatter_nd_ops_test.py - tensorflow/python/training/adam_test.py - tensorflow/python/training/localhost_cluster_performance_test.py - tensorflow/python/training/training_ops_test.py * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Tests affected: - tensorflow/python/debug/cli/analyzer_cli_test.py - tensorflow/python/debug/lib/session_debug_testlib.py - tensorflow/python/debug/lib/stepper_test.py - tensorflow/python/kernel_tests/unstack_op_test.py - tensorflow/python/ops/image_ops_test.py * [OpenCL] Take options.config.device_count() into consideration * [OpenCL] Fixes compilation warning * [OpenCL] device:SYCL:0 -> sycl:0 * [OpenCL] Removes unwanted flags in building script Removes flags given to computecpp that enable SIMD instructions Removes duplicate flags * bool -> const bool * [OpenCL] sycl in test_util.gpu_device_name() -> is_sycl_enabled() * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Test affected: - tensorflow/contrib/stateless/python/kernel_tests/stateless_random_ops_test.py * Imports test_util from tensorflow.python.framework * [OpenCL] Fixes formatting in Python code * [OpenCL] Extends session_test.py to cover SYCL device * [OpenCL] Cleans singleton class * [OpenCL] Keeping CUDA happy * [OpenCL][Temporary] Disables failing tests for SYCL in order to establish regression baseline Test affected: - tensorflow/contrib/rnn/python/kernel_tests/core_rnn_cell_test.py - tensorflow/contrib/seq2seq/python/kernel_tests/beam_search_ops_test.py * Added support for building with SYCL on ARM. * Acts on the review feedback from: - tensorflow/tensorflow#9117 (comment) - tensorflow/tensorflow#9117 (comment) * [OpenCL] Fixes scatter_nd_op_test * Fixes auto-merge mistake * [OpenCL] struct SyclDevice -> class SyclDevice * Revert "[OpenCL] struct SyclDevice -> class SyclDevice" This reverts commit addd43348c374a5379f67bb1e5ad084715722fc2. * [OpenCL] Reverting refactoring commit. As requested in the review tensorflow/tensorflow#9117 (comment) This change set will be re-introduced in smaller chunks. * Revert "[OpenCL] device:SYCL:0 -> sycl:0" This reverts commit cf16e60340b62d16c3764d71b716fe03d35f87a9. * Revert "[OpenCL] Adds SYCL to device backwards compatibility" This reverts commit b8401b5164199b7a169be1c1d8dea5001195c390. * Acts on the feedback from tensorflow/tensorflow#9117 (comment) * control_flow_ops_py_test.py expects device name to be lower cased * Acts on the feedback from tensorflow/tensorflow#9117 (comment) * Removes debug print * Removes not needed partial specialisation * [OpenCL] Registers ScatterNdFunctor for SYCL device * [OpenCL] Make it compile * [OpenCL] Follow gpu_device changes * [OpenCL] Adds cxx_builtin_include_directory for python lib Fixes bazels missing undeclared inclusions that appeared after merge with TensorFlow upstream * [OpenCL] Fixes Constant Op * [OpenCL] gXX-4.8 -> gXX * [OpenCL] Removes -D_GLIBCXX_USE_CXX11_ABI=0 as it breaks default compiler setup for Ubuntu 16.04 * Revert "[OpenCL] kernel_estimator_test.py assertEqual-> assertAlmostEqual due to floating point representation on the device" This reverts commit 06c50c0a485f40c30a436f02c3fa7794e370c49d. * [OpenCL] CPU allocator is a singleton we should not delete it
OpenCL implementation improvements (#22)
CUDA which uses pointers and OpenCL that uses buffers/accessors
with GPU device having the highest priority ( doesn't include blacklisted devices )
avoid compilation reported error
representation on the device