Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please bring back native Windows CUDA support! #59918

Open
GatGit12 opened this issue Mar 7, 2023 · 53 comments
Open

Please bring back native Windows CUDA support! #59918

GatGit12 opened this issue Mar 7, 2023 · 53 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype:windows Windows Build/Installation Issues TF 2.11 Issues related to TF 2.11 type:build/install Build and install issues

Comments

@GatGit12
Copy link

GatGit12 commented Mar 7, 2023

Click to expand!

Issue Type

Others

Have you reproduced the bug with TF nightly?

Yes

Source

binary

Tensorflow Version

2.11

Custom Code

Yes

OS Platform and Distribution

Windows 10

Mobile device

No response

Python version

3.9

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

I am very disappointed and sad that native Windows CUDA support was simply dropped.

The replacement with WSL2 is not sufficient for processes and systems that are not able to use WSL2 because, for example, if they are also Windows-native applications and the port is too expensive. So we can only use version 2.10 (of both the Python and C APIs) and are stuck with it, which is a shame because it prevents us from benefiting from and participating in new developments. We also see a performance loss of about 5% in WSL2, which leads to higher power consumption and thus has a direct impact on our climate, which can make a big difference in our already very compute-intensive business. In addition, the Windows Direct-ML-Plugin interface is not sufficient, since the performance does not yet reach CUDA and optimizations like XLA and others are not supported. Also, you lose all your highly optimized and expensively developed TF CUDA Custom Ops.

It is also clear that the native CUDA feature on Windows is much needed, see here in the following issues other people are looking for exactly the native CUDA feature on Windows:

All this leads to the simple exclusion and virtual discrimination of a large part of the Tensorflow community that uses CUDA Windows natively.

Why has support been dropped? You could at least keep support for CUDA Windows Native in custom builds.
I hope and ask that you bring back Windows native CUDA support and let people decide for themselves if they want to use native CUDA or WSL2.

Thank you for the development of Tensorflow! My favourite DL framework! :)

Standalone code to reproduce the issue

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Relevant log output

No response

@google-ml-butler google-ml-butler bot added the type:others issues not falling in bug, perfromance, support, build and install or feature label Mar 7, 2023
@synandi synandi added the TF 2.11 Issues related to TF 2.11 label Mar 7, 2023
@TheHellTower
Copy link

Hello,

He is totally right and Windows is still used by so many people even if not all developers using Tensorflow use it mainly, so I find it really stupid to drop the native-Windows GPU support when we know that a lot of people still use Windows from the development to the production deployment(even if it's not perfect and I'm not necessarily talking about companies).
Even if I can be wrong in what I'm saying because I don't have any actual stats for it, It is unfair to just drop it like this it mean a lot of work will be lost and need a big or massive rebase so initiate a lot of changes and it would not be worth it for a lot of projects. For people that would want to update their project they would be forced to work with a kinda outdated version of Tensorflow.

Note: I'm sorry if I'm wrong on things I'm not a professional or even working in a company but programming is a hobby that I love and I would like to be able to continue it in good conditions and stay up to date for the longest time possible.

Regards.

@tilakrayal tilakrayal assigned tilakrayal and unassigned synandi Mar 13, 2023
@tilakrayal tilakrayal added subtype:windows Windows Build/Installation Issues type:build/install Build and install issues and removed type:others issues not falling in bug, perfromance, support, build and install or feature labels Mar 13, 2023
@WhiteByeBye
Copy link

Absolutely precise, multitudes of researchers continue to employ TensorFlow on Windows; please reinstate support for native Windows GPU.

@HeloWong
Copy link

low performance and inconvenience with WSL2 IO, It means that it's hard to apply for large scale dataset via TensorFlow

@sachinprasadhs
Copy link
Contributor

This change has been made along with the changes in the release for different platforms build with the help of official build collaborators.
With the inclusion of WSL will be comparatively easy to maintain the framework for both Linux and Windows.
Here is the link to the announcement blog which talks about official build collaborators. Thanks!

@sachinprasadhs sachinprasadhs added the stat:awaiting response Status - Awaiting response from author label Apr 4, 2023
@GatGit12
Copy link
Author

GatGit12 commented Apr 7, 2023

It seems that you just repeat the same answer as in all the other issues, without really caring about the question I asked. The problem is not that tensorflow doesn’t support gpus on windows anymore, but that there is no way to use cuda natively on windows. Maybe the framework is easier to build and maintain now, but before version 2.11 it worked fine…

With this change, tensorflow is almost impossible to use on native cuda windows (or be stuck at v2.10.1), for example if you need cuda custom ops or you can’t use wsl2 or you depend on the TF C-API…

Perhaps a native cuda windows custom tensorflow build could be enabled for the users who wish to retain the functionality of the previous versions before 2.11 while also benefiting from the latest updates. That would be a reasonable compromise and a minor step back to native cuda on windows.

I feel like you don’t understand me and you just want to ignore the problem like in the other issues I linked. That makes me very sad and disappointed, because tensorflow used to be my favorite framework. :(

@google-ml-butler google-ml-butler bot removed the stat:awaiting response Status - Awaiting response from author label Apr 7, 2023
@TheHellTower
Copy link

It seems that you just repeat the same answer as in all the other issues, without really caring about the question I asked. The problem is not that tensorflow doesn’t support gpus on windows anymore, but that there is no way to use cuda natively on windows. Maybe the framework is easier to build and maintain now, but before version 2.11 it worked fine…

With this change, tensorflow is almost impossible to use on native cuda windows (or be stuck at v2.10.1), for example if you need cuda custom ops or you can’t use wsl2 or you depend on the TF C-API…

Perhaps a native cuda windows custom tensorflow build could be enabled for the users who wish to retain the functionality of the previous versions before 2.11 while also benefiting from the latest updates. That would be a reasonable compromise and a minor step back to native cuda on windows.

I feel like you don’t understand me and you just want to ignore the problem like in the other issues I linked. That makes me very sad and disappointed, because tensorflow used to be my favorite framework. :(

I used to love tensorflow too but I feel like they kinda want to kill users that work from development to production on Windows.

Well okay it made it easier to update but at what cost ? You are just cutting of "the legs" of many projects.. That's really unfair in my opinion, If you want quality you always do what it takes to achieve it no ? So why not just continue with cuda support for Windows when it was literally very useful for many people ? Not everything are easy in life and this decition to remove tensorflow really sucks. Don't forget there are even companies that work with Windows from the development to production(even if not a lot) so you are cutting some companies because not everyone want to waste money to pay the developers to rewrite the "whole" project, it's not cheap and also a hard work.

@sachinprasadhs sachinprasadhs added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Apr 11, 2023
@learning-to-play
Copy link
Collaborator

In this TensorFlow blog please see section "Expanded GPU support on Windows". Also please see TensorFlow install page and this page for more info. Please feel free to sign up to the mailing list announce@tensorflow.org to be notified of the most recent updates.

@johnnkp
Copy link
Contributor

johnnkp commented Apr 15, 2023

I have tried to build TensorFlow 2.12 with CUDA on windows 10. Errors as follows:

C:\Users\tensorflow\Downloads\tensorflow-2.12.0>bazel build --config=opt --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
Inherited 'common' options: --isatty=1 --terminal_columns=157
INFO: Reading rc options for 'build' from c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc:
Inherited 'common' options: --experimental_repo_remote_exec
INFO: Options provided by the client:
'build' options: --python_path=C:/Users/tensorflow/anaconda3/python.exe
INFO: Reading rc options for 'build' from c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc:
'build' options: --define framework_shared_object=true --define tsl_protobuf_header_only=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --spawn_strategy=standalone -c opt --announce_rc --define=grpc_no_ares=true --noincompatible_remove_legacy_whole_archive --enable_platform_specific_config --define=with_xla_support=true --config=short_logs --config=v2 --define=no_aws_support=true --define=no_hdfs_support=true --experimental_cc_shared_library --experimental_link_static_libraries_once=false --incompatible_enforce_config_setting_visibility
INFO: Reading rc options for 'build' from c:\users\tensorflow\downloads\tensorflow-2.12.0.tf_configure.bazelrc:
'build' options: --action_env PYTHON_BIN_PATH=C:/Users/tensorflow/anaconda3/python.exe --action_env PYTHON_LIB_PATH=C:/Users/tensorflow/anaconda3/lib/site-packages --python_path=C:/Users/tensorflow/anaconda3/python.exe --config=tensorrt --action_env CUDA_TOOLKIT_PATH=C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8 --action_env TF_CUDA_COMPUTE_CAPABILITIES=3.5,5.0,6.1,7.0 --config=cuda --copt=/d2ReducedOptimizeHugeFunctions --host_copt=/d2ReducedOptimizeHugeFunctions --define=override_eigen_strong_inline=true
INFO: Reading rc options for 'build' from c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc:
'build' options: --deleted_packages=tensorflow/compiler/mlir/tfrt,tensorflow/compiler/mlir/tfrt/benchmarks,tensorflow/compiler/mlir/tfrt/jit/python_binding,tensorflow/compiler/mlir/tfrt/jit/transforms,tensorflow/compiler/mlir/tfrt/python_tests,tensorflow/compiler/mlir/tfrt/tests,tensorflow/compiler/mlir/tfrt/tests/ir,tensorflow/compiler/mlir/tfrt/tests/analysis,tensorflow/compiler/mlir/tfrt/tests/jit,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_tfrt,tensorflow/compiler/mlir/tfrt/tests/lhlo_to_jitrt,tensorflow/compiler/mlir/tfrt/tests/tf_to_corert,tensorflow/compiler/mlir/tfrt/tests/tf_to_tfrt_data,tensorflow/compiler/mlir/tfrt/tests/saved_model,tensorflow/compiler/mlir/tfrt/transforms/lhlo_gpu_to_tfrt_gpu,tensorflow/core/runtime_fallback,tensorflow/core/runtime_fallback/conversion,tensorflow/core/runtime_fallback/kernel,tensorflow/core/runtime_fallback/opdefs,tensorflow/core/runtime_fallback/runtime,tensorflow/core/runtime_fallback/util,tensorflow/core/tfrt/eager,tensorflow/core/tfrt/eager/backends/cpu,tensorflow/core/tfrt/eager/backends/gpu,tensorflow/core/tfrt/eager/core_runtime,tensorflow/core/tfrt/eager/cpp_tests/core_runtime,tensorflow/core/tfrt/gpu,tensorflow/core/tfrt/run_handler_thread_pool,tensorflow/core/tfrt/runtime,tensorflow/core/tfrt/saved_model,tensorflow/core/tfrt/graph_executor,tensorflow/core/tfrt/saved_model/tests,tensorflow/core/tfrt/tpu,tensorflow/core/tfrt/utils
INFO: Found applicable config definition build:short_logs in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:v2 in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --define=tf_api_version=2 --action_env=TF2_BEHAVIOR=1
INFO: Found applicable config definition build:tensorrt in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --repo_env TF_NEED_TENSORRT=1
INFO: Found applicable config definition build:cuda in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --repo_env TF_NEED_CUDA=1 --crosstool_top=@local_config_cuda//crosstool:toolchain --@local_config_cuda//:enable_cuda
INFO: Found applicable config definition build:opt in file c:\users\tensorflow\downloads\tensorflow-2.12.0.tf_configure.bazelrc: --copt=/arch:AVX2 --host_copt=/arch:AVX2
INFO: Found applicable config definition build:windows in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --copt=/W0 --host_copt=/W0 --copt=/Zc:__cplusplus --host_copt=/Zc:__cplusplus --copt=/D_USE_MATH_DEFINES --host_copt=/D_USE_MATH_DEFINES --features=compiler_param_file --copt=/d2ReducedOptimizeHugeFunctions --host_copt=/d2ReducedOptimizeHugeFunctions --cxxopt=/std:c++17 --host_cxxopt=/std:c++17 --config=monolithic --copt=-DWIN32_LEAN_AND_MEAN --host_copt=-DWIN32_LEAN_AND_MEAN --copt=-DNOGDI --host_copt=-DNOGDI --copt=/Zc:preprocessor --host_copt=/Zc:preprocessor --linkopt=/DEBUG --host_linkopt=/DEBUG --linkopt=/OPT:REF --host_linkopt=/OPT:REF --linkopt=/OPT:ICF --host_linkopt=/OPT:ICF --verbose_failures --features=compiler_param_file --distinct_host_configuration=false
INFO: Found applicable config definition build:monolithic in file c:\users\tensorflow\downloads\tensorflow-2.12.0.bazelrc: --define framework_shared_object=false --define tsl_protobuf_header_only=false --experimental_link_static_libraries_once=false
ERROR: C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/compiler/xla/pjrt/BUILD:469:11: in cc_library rule //tensorflow/compiler/xla/pjrt:pjrt_future: target '@tf_runtime//:support' is not visible from target '//tensorflow/compiler/xla/pjrt:pjrt_future'. Check the visibility declaration of the former target if you think the dependency is legitimate
ERROR: C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/compiler/xla/pjrt/BUILD:469:11: Analysis of target '//tensorflow/compiler/xla/pjrt:pjrt_future' failed
INFO: Repository cudnn_frontend_archive instantiated at:
C:/users/tensorflow/downloads/tensorflow-2.12.0/WORKSPACE:15:14: in
C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:967:21: in workspace
C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:171:20: in _tf_repositories
C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:89:35: in
INFO: Repository org_sqlite instantiated at:
C:/users/tensorflow/downloads/tensorflow-2.12.0/WORKSPACE:15:14: in
C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:967:21: in workspace
C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:310:20: in _tf_repositories
C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:89:35: in
INFO: Repository mkl_dnn_v1 instantiated at:
C:/users/tensorflow/downloads/tensorflow-2.12.0/WORKSPACE:15:14: in
C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:967:21: in workspace
C:/users/tensorflow/downloads/tensorflow-2.12.0/tensorflow/workspace2.bzl:188:20: in _tf_repositories
C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:136:21: in tf_http_archive
Repository rule _tf_http_archive defined at:
C:/users/tensorflow/downloads/tensorflow-2.12.0/third_party/repo.bzl:89:35: in
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted:
INFO: Elapsed time: 221.760s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (558 packages loaded, 28119 targets configured)
currently loading: @jsoncpp_git// ... (3 packages)
Fetching repository @local_config_git; starting 9s
Fetching https://storage.googleapis.com/mirror.tensorflow.org/github.com/oneapi-src/oneDNN/archive/refs/tags/v2.7.3.tar.gz; 1.2 MiB (1,220,608B)
Fetching https://storage.googleapis.com/mirror.tensorflow.org/www.sqlite.org/2022/sqlite-amalgamation-3400100.zip; 84.4 KiB (86,386B)

Is it solved in the latest 2.12 source code?

@AsakusaRinne
Copy link
Contributor

I also think that is important. There is still many people using Windows as development environment.

@johnnkp
Copy link
Contributor

johnnkp commented Apr 16, 2023

After switching back bazel to 5.3.0, I can run toolchain building until my disk has 7GB left and I stopped it. Don’t expect the build progress can be finished within a single bazel run. The server build script should run bazel command again when it failed (let say 10 times) to better handle errors like 404 not found and XXX is not defined.
Also, there should have option to let people choose prebuilt LLVM path, so we don’t need to waste days of time and GBs of space to build LLVM from code.

Build environment:
Windows 10 1909
Visual Studio 2022 (build tools 14.35)
msys2-x86_64-20230318
CUDA 11.8
CUDNN 8.6.0
TensorRT 8.5.3

@mihaimaruseac
Copy link
Collaborator

While native GPU support on Windows will bring back the 5% perf increase and will support a few more users, please note that the total number of users of TF on Windows is very small compared to the other usecases and that there is almost no Windows expertise at Google to maintain this build. So, more than a year ago, it was decided to drop the Windows GPU support as the maintenance burden did not justify the costs, especially given that alternatives (i.e., WSL and using a Linux environment) exist.

I acknowledge that this might not be desired answer (and that previous answers from team did not treat the issue accordingly), but this is the most we can reliably do.

@AsakusaRinne
Copy link
Contributor

@mihaimaruseac Is it still possible to build from source on Windows with cuda? I'm sad that I can't benefit from the most recent works of tensorflow and I'd like to build from source myself. However if that means I need to modify lots of cpp code, the price is too high.

@mihaimaruseac
Copy link
Collaborator

It should still be possible. There's also tensorflow/build repo and SIG Build where community can provide help for that, it's just that Google cannot effectively maintain the build itself.

@johnnkp
Copy link
Contributor

johnnkp commented Apr 18, 2023

It should still be possible.

If you try to run bazel command again for several times (2 times in my case) without clean away the build progress, the errors related to XXX is not defined should be resolved.

@mihaimaruseac My suggestion of prebuilt LLVM path isn't related to CUDA. Is this improvement out of the ability of tensorflow team?

@johnnkp
Copy link
Contributor

johnnkp commented Aug 14, 2023

Building newest tensorflow is much harder than I think. If someone knows how to fix the following error, please create a pull request directly:

lld-link: error: undefined symbol: _mlir_ciface_XXX_GPU_DT_XXX_DT_XXX
referenced by gpu_XXX_op.lo.lib(gpu_op_XXX.obj):(public: virtual struct tensorflow::UnrankedMemRef __cdecl tensorflow::`anonymous namespace'::MlirXXXGPUDT_XXXDT_XXXOp::I
nvoke(class tensorflow::OpKernelContext *, class llvm::SmallVectorImpl struct tensorflow::UnrankedMemRef &))

lld-link: error: too many errors emitted, stopping now (use /errorlimit:0 to see all errors)

@melMass
Copy link

melMass commented Aug 14, 2023

@johnnkp i will need to look into it, any pointers to know about since I don't see mentions of build instructions for Windows?/

@johnnkp
Copy link
Contributor

johnnkp commented Aug 14, 2023

I found out that bazel-out\x64_windows-opt\bin\tensorflow\python\pybind_symbol_target_libs_file.txt is missing a line of bazel-out/x64_windows-opt/bin/tensorflow/core/kernels/mlir_generated/base_op.lib. Then, bazel-out\x64_windows-opt\bin\tensorflow\python\pywrap_tensorflow_filtered_def_file.def is missing symbols of base_op and cause the above error.

I have moved to ARM mac and no ML project on my hand. You need to try it yourself.

@cugels
Copy link

cugels commented Aug 21, 2023

I find it a nightmare scenario attempting the absurd balancing act of trying to get CUDA, WSL2, and Tensorflow working. And once you get it going, you're trapped in such a narrow corner, of unstable systems, that it's utter torture.

This balancing act is a nonstop barrage of dysfunctional, outdated support and red-herring help articles. The only ones that work back you into such a narrow corner, it's not worth the effort. I'm shocked as the support articles from the big companies all seem to be outdated. I suspect they are struggling because Google's solution to support windows has created a mess they didn't forsee, leaving a surreal Windows ecosystem of dysfunctionality. I've been on this two days solid, and I'm giving up as nothing is worth this pain. I'll use painful CPUs when I need to use tensorflow here and there, but this has pushed me to PyTorch.

It's normal for new technology to emerge in chaos like this. But it's absurd when an established product is sent to the land of surreal dysfunctionality intentionally.

I'm now giving up.

@User-Hrsat
Copy link

so let us uninstall tensorflow and use pytorch 🙂

@TheHellTower
Copy link

so let us uninstall tensorflow and use pytorch 🙂

Do you think they care ? It's just -1 person that could complain about Windows support they will just be happy 🤣
Why do you react to your own messages too ? 👀

@TheHellTower
Copy link

so let us uninstall tensorflow and use pytorch 🙂

Do you think they care ? It's just -1 person that could complain about Windows support they will just be happy 🤣
Why do you react to your own messages too ? 👀

so we should do this

I'm not sure to understand tbh

@Heryk13
Copy link

Heryk13 commented Aug 31, 2023

so sad that it doesn't support windows, i just figured it out now.
i was trying to use it and when searching i landed here.

@petrovfedor
Copy link

There are also some industrial systems using Windows Embeded for inference in production lines. Right now one of the solutions for them is to migrate their plugins to ONNX.

@GatGit12
Copy link
Author

Hi,

yes it is very upsetting that native CUDA Windows support has been discontinued (i think this issue shows this clearly), also making cool projects like the one from @melMass and others less- or even unusable for Windows users.
In addition, some of the alternatives (direct-ml) are also no longer developed and make the use of TF on Windows even more difficult, see: microsoft/tensorflow-directml-plugin#369
Discussions on reddit and co also seem to mostly advocate moving to pytorch (e.g. https://www.reddit.com/r/MachineLearning/comments/16hqzxy/d_tensorflow_dropped_support_for_windows/)

For our use case, running on Windows and relying on native CUDA support, we decided to stick with TF2.10.1 for now. But I'm also looking forward to testing the new Keras 3.0 with different backends, maybe it could make it easier for us to switch to Pytorch or Jax if native Cuda Windows support doesn't come back and we need to upgrade our approach.

So in summary, even though there is demand for native CUDA on Windows (and people fixing bugs for TF on Windows like @johnnkp and others), i unfortunately have little hope that it will be re-enabled (see https://discuss.tensorflow.org/t/sig-build-october-meeting-october-3-2pm/19918) even though that would make me very happy.

Whatever the future holds for TF on Windows, I thank the TF team for their work.

@mihaimaruseac
Copy link
Collaborator

I am actually +1 on trying Keras 3.0 and picking the backend that has the most support for your operating system, while you are experimenting. I might come back with future recommendations here to also cover security / supply chain needs.

@toastershock
Copy link

I started with deep learning reading the book "Deep Learning with R (second edition)" and all what is mentioned there regarding GPU support is: Download and install the necessary NVIDIA driver...
As this didn't work, I looked into all the tutorials available from Tesorflow and Posit
But nothing worked. As a scientist and not developer, it took me 2 days to get RStudio running on WSL2. But even after that, nothing worked, the NVIDIA GPU driver had unmet dependencies, neither python nor R could detect my GPU. Even after 1 week of intense research and learning about Linux (never used before), I am still not able to use tensorflow with GPU support, even on WSL2.
I also asked here for help: https://community.rstudio.com/t/using-tensorflow-with-gpu-on-windows/176506

But the standard answer, also in this issue blog is, "use WSL". Well, it's not working, neither for me, and apparently nor for many other people.
The truth is, that machine learning models require a lot of computation power, excluding a large community which are non-developer Windows users, from GPU support is not a nice move.
I really hope, tensorflow will bring back native GPU support on Windows.
Many thanks

@ourplan9000
Copy link

I really hope, tensorflow will bring back native GPU support on Windows.

@sh-shahrokhi
Copy link

Please bring this back, at least in a SIG build form, or add instructions to build (it would be totally fine with just community support)

@mihaimaruseac
Copy link
Collaborator

For building yourself, you can try the instructions at https://www.tensorflow.org/install/source_windows but you will be on your own as the build is not supported, no one can help fix build errors.

You can try submitting PRs to fix build errors, but again those would be hard to validate and might become stale.

Also, worth noting that the instructions linked above assumed compiling with MSVC+NVCC but now (most if not all) builds just use clang everywhere.

@TheHellTower
Copy link

For building yourself, you can try the instructions at https://www.tensorflow.org/install/source_windows but you will be on your own as the build is not supported, no one can help fix build errors.

You can try submitting PRs to fix build errors, but again those would be hard to validate and might become stale.

Also, worth noting that the instructions linked above assumed compiling with MSVC+NVCC but now (most if not all) builds just use clang everywhere.

The Last version of the GPU source doesn't contain the thing from the problem so.

@JackTrapper
Copy link

JackTrapper commented Jan 14, 2024

You're missing the larger point: bringing machine learning to the public.

Prosumers and consumers will (and do) want machine learning on their desktop.

  • so they can perform inference offline (without the privacy, cost, and refusals, of a 3rd party)
  • so they can perform stable diffusion offline (without the privacy, cost, and refusals, of a 3rd party)
  • so they can have image recognition
  • so their desktop application can have AI assistants

Like how the microprocessor opened up the ability for people to have computers (perviously only available to large corporations) in the home.

The next step is to have machine learning in the home.

And Windows is the home machine.


Put it another way:

  1. Tensorflow doesn't support Windows
  2. There's so few Windows users
  3. Goto 1

@mihaimaruseac
Copy link
Collaborator

mihaimaruseac commented Jan 14, 2024

For this case, I'd use JAX (via Keras, you still write the same code, just change backend)

@JackTrapper
Copy link

For this case, I'd use JAX (via Keras, you still write the same code, just change backend)

In that case i'd just use ONNX runtime (https://onnxruntime.ai/). Since that's what Microsoft said on their github page announcing they are also abandonining Windows developers:

All latest DirectML features and performance improvements are going into onnxruntime for inference scenarios.

Which is to say: i'm not.

Because now there's a 4th thing you have to master before you can even do the first thing:

# print the version of python
import sys
print(f"Python version: {sys.version}") # e.g. 3.10.11 (tags/v3.10.11:2cd268a, Oct  4 2021, 09:09:32) [MSC v.1929 64 bit (AMD64)]

@WhiteByeBye
Copy link

For this case, I'd use JAX (via Keras, you still write the same code, just change backend)

However, the funny thing is that JAX also does not support native Windows GPU, and even Windows builds are maintained by the community, so the logic holds:

  1. Tensorflow (JAX) doesn't support Windows
  1. There's so few Windows users
  2. Goto 1

😂😂😂

@mihaimaruseac
Copy link
Collaborator

I failed, sorry :)

@Frost-Lord
Copy link

Native Windows CUDA was cool but it's still not much of a hassle to just use WSL2 anyway

@sh-shahrokhi
Copy link

sh-shahrokhi commented Feb 13, 2024 via email

@JackTrapper
Copy link

Native Windows CUDA was cool but it's still not much of a hassle to just use WSL2 anyway

The problem with that is that it requires the user's Windows PC to have WSL2 installed.

It is a very poor developer indeed who creates a Windows application that cannot run under Windows.

@Yatagarasu50469
Copy link

Yatagarasu50469 commented Mar 3, 2024

I predicate this post with deepest thanks to the innumerable developers who have made this library possible and useful.

Echoing similar sentiment to the many others who have voiced their disappointment with the decision to drop native Windows CUDA support. The lack of clear communication to the community regarding the underlying reasoning behind the move and the lack of advanced warning was particularly disappointing, disheartening, frustrating, and hugely problematic to the "few" still, not only using, but reliant on TensorFlow, with acceleration, on Windows. The September 2022 announcement post for v2.10 promised that the official build collaborations would not "disrupt end user experiences" and "For the majority of use cases, there will be no changes to the behavior of pip install..." Surprising as it may be to those involved in the responsible management, not being able to use GPU acceleration for a deep learning framework has proven to be an awfully significant change and majorly disruptive to end-users and developers. The project may as well have said that they are dropping Windows support, rather than "leading people on" to think they can effectively develop ML on the platform using just CPU. On a number of levels, particularly after having made the investment(s) necessary in transitioning to v2 from v1 (instead of electing to move to PyTorch then), this has felt like a betrayal of trust. Frankly, it's just such a shame to see this from a company that was built on the premise of "Don't be Evil."

Neither WSL2, nor the tensorflow-directml-plugin (which incidentally only seems to have ever been partially functional on versions <=2.10, with development apparently permanently on pause) are sufficient workarounds/solutions Besides the decreased performance and utter disregard for ease-of-use, there are applications, libraries, and codes that simply cannot be run in, or interface easily with, WSL2. For example, (I realize this is somewhat niche) there are vendor-proprietary Mass Spectrometry Imaging .dll libraries, that interface with COM objects, which remain only compatible/functional on native Windows. Effectively, if you happen to be doing work with Agilent Mass Spectrometry platforms, there isn't a simple and effective way to combine these with an accelerated TensorFlow framework for active scanning/inferencing/processing.

Expending several utterly fruitless weeks seeking ways to avoid the choice between dropping support for Agilent .d files from my application or learning/rebuilding in yet another ML framework, has only made this more painful. It's been over a year. The lack of viable alternatives to interfacing through WSL2, issues with augmentation performance through Keras (tied to versions <=2.10 releases), and the longstanding (and utterly baffling) inability to clear GPU memory have at last forced a migration to PyTorch (delightfully/incidentally discovered to train faster with an identical architecture).

For the sake of those who still are holding out hope, please reinstate the requested support or at least provide clear/updated documentation on alternative methods to leverage Windows-only native functions/capabilities in combination with TensorFlow through WSL2. Thank you.

@GatGit12
Copy link
Author

Hi all,

So this is issue is now a bit over one year old, with lots of comments and discussions. In the meantime users continue to need/ask for native GPU under Windows. (see #62938, #62501, #62613)

Additionally (since TF>=2.11), its no longer feasible to develop native Windows applications that use TensorFlow since without GPU acceleration its mostly pointless to use it, see comments of @JackTrapper and @Yatagarasu50469 above.

Moreover it seems that even most recent TF for native Windows CPU also is having its problems see: #63860 and #64396. Also the build process seems to rely on clang now:

* Clang is now the default compiler to build TensorFlow CPU wheels on the

And even though currently building with MSVC is still supported, the history with this issue and native GPU support has shown that this may not be the case for much longer. Also Windows CI for clang is not public see:

- Linux, MacOS, and Windows machines (these pool definitions are internal)
)

With this information it is therefore reasonable to assume that native Windows support will be/might be completely discontinued in the near future.

From this, I can start to understand people commenting and meming about TF with "switch to pytorch", since in the future for native Windows users this seems to be one of the only options. Either switch in a slower and smoother way via the new Keras 3.0 and pytorch backend and then switch completely, or switch directly and painfully rewrite using pytorch.

Overall, I am very disappointed with how TF has evolved and how native Windows users have been ignored. Our team is currently still holding on to TF 2.10.1 even though we would like to use newer features and benefit from performance improvements. We still have the (small) hope that native Windows support might be re-enabled because we have developed a native Windows application and it is (currently) too expensive to rewrite it using pytorch (only time will tell if we have to).

Whatever happens next with TF and native Windows. I thank the TF team for their work as we still enjoy using TF v2.10.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype:windows Windows Build/Installation Issues TF 2.11 Issues related to TF 2.11 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests