New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support EIGEN_USE_MKL_ALL macro for building tensorflow #34924
Comments
@refraction-ray Could give the full build commands which you use? |
@Leslie-Fang , I believe the building workflow is similar to official doc using devel docker. The only tweak I have done is changing the flag in tensorflow.bzl directly into |
@refraction-ray if that's the case in #12219 , although I don't know why but MKLDNN support and eigen with mkl support seems cann't be enabled simultaneously. |
@Leslie-Fang , as I have mentioned, QR, SVD decompositions are slow with eigen implementation compared to MKL multithreaded version. |
@refraction-ray Do you know how to measure the performance of QR, SVD ops with MKL multithreaded version? |
I see some comments in the #7128 about how to measure the performance. |
@Leslie-Fang , great! Be sure to use mkl linked numpy for the benchmark |
@refraction-ray Have checked the implementation of QR and SVD in tensorflow. tensorflow/tensorflow/core/kernels/qr_op_impl.h Lines 101 to 120 in a74e202
Both of them are invoking eigen in-place decomposition. https://eigen.tuxfamily.org/dox/classEigen_1_1HouseholderQR.html I suspect there is no parallel optimization we can do in the tensorflow op. |
@Leslie-Fang , this is why I believe the better solution here is to enable mkl linkage with eigen when building tensorflow. If this works, all ops can enjoy multithreaded mkl implementation and no need to hack these ops one by one. |
Hi everyone, could you please tell if there has been any progress on the subject? I am not an expert, but enabling MKL support for Eigen operations sounds like a reasonable solution, and would be highly appreciated. Otherwise, it's a bit frustrating to see how TensorFlow humbly uses only one core, while NumPy enjoys running SVD with >1000% CPU load :-) |
@refraction-ray |
Please make sure that this is a feature request. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template
System information
Describe the feature and the current behavior/state.
If my understanding is correct, the compiling flag
--config=mkl
for bazel only enables mkl-dnn supports which could replace several basic ops like matrix multiplication with mkl implementation using jit. However, it seems to me that this flag doesn't enable mkl linkage with Eigen. Therefore, all linear ops beyond several covered in mkl-dnn are still executed in plain eigen single-threaded implementation which is too slow to use (there can be O(10) or even O(100) speed difference for large matrix and eigh, svd, qr ops, as previously noted in #7128, #13222, etc.).Currently, in tensorflow.bzl,
DEIGEN_USE_VML
flag is set when compiled with --config=mkl. As explained in #30592, this indicates eigen is not enabled with mkl at all. But a simple replacement of this flag withDEIGEN_USE_MKL_ALL
leads to failure of the compiling with the error complaining <mkl.h> not found. Also as noted #12219, MKL optimized Tensorflow does not support EIGEN_USE_MKL_ALL. I know little about bazel setup, so I don't know whether turn on such support is involved or as simple as some small tweaks.In sum, tuning tf building system to "really" enable mkl behind tf is of great importance and it is vital for the speed of a large range of matrix ops. And this should be the expected behavior for
--config=mkl
flag after all. Currently, so called "intel optimized" or "mkl enabled" tensorflow is somehow confusing.Will this change the current api? How?
Not for the user level API.
Who will benefit with this feature?
Anyone using tf in his/her workflow including matrix operations like EIG, SVD, QR etc on CPU. (One can argue that there is no problem for GPU implementation, but cusolver implementations for SVD and QR can still be much slower than mkl cpu implementations. So fast CPU implementations are critical for these matrix decomposition types op).
Any Other info.
The text was updated successfully, but these errors were encountered: