Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is preventing TF to use GPU when used in native windows? #69750

Open
eabase opened this issue Jun 14, 2024 · 16 comments
Open

What is preventing TF to use GPU when used in native windows? #69750

eabase opened this issue Jun 14, 2024 · 16 comments
Assignees
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype:windows Windows Build/Installation Issues TF 2.16 type:build/install Build and install issues

Comments

@eabase
Copy link

eabase commented Jun 14, 2024

We keep getting told, repeatedly that TF no longer "supports" using GPU on versions >2.10. However, the reason why this is so, is nowhere to be found.

Can someone from the TF community please explain what is the problem with TF having GPU support for native windows environment?

Finally, since the wording is "supported" (and it was supported before), how can we go about implementing this support on our own?

What are the python package requirements, and native windows C/C++/C# compiler requirements?

@sgkouzias
Copy link

sgkouzias commented Jun 14, 2024

@eabase according to Simon_Au-Yong in a relative discussion on the TensorFlow Forum:

Native Windows is challenging for the TF team to support given their limited resources. Hence them moving to WSL2 is understandable because it’s the best of both worlds, being both a Linux environment and having access to the GPU. The other option would be to boot direct from Linux from say an external SSD.

For professionals coming from other fields who have to work with neural nets Collab would be a quick option. Nothing to fuss about. And if you have the budget and need performance, TPUs are a great choice.

Overall, the best environment to run TF with GPU support is on Linux x86-64 with the appropriate drivers installed.

@thiagojramos
Copy link

As you can see from the above comment, the only obstacle is the "Desire". Besides, their target audience doesn't seem to be Windows users (let alone home users who need to install this package because it's a requirement for something else).

@mihaimaruseac
Copy link
Collaborator

Up to 2022, I was in the team responsible for making TF work on all platforms. Windows, and especially GPU, was the one with the most breakages and also the one with the least expertise. Everyone was focused on Linux and only a team of ~3 was working on the other ecosystems.

Since then, none of those is still in TF and support is very reduced, even for Linux

@sgkouzias
Copy link

sgkouzias commented Jun 14, 2024

@mihaimaruseac I am relatively a newbie in the world of TensorFlow but I wonder why valuable TensorFlow team members left the team and why in general the number of members was so radically reduced. I really don't get it..

@eabase
Copy link
Author

eabase commented Jun 14, 2024

Hi Guys!
Thank you so much for quick feedback. Not what I was hoping to hear, but nevertheless better than silence.

@sgkouzias
Hi Sotiris,
I wasn't really looking for an answer that involves buying cloud server time. I have my own GPU, and want to put it to full use in all my environments, which is why WSL, is just not enough. I'm a huge fan of *nix, but at the end of the day, people wanna run in windows with whatever HW they have. SO it sounds very strange that "windows" is not supported. Everything else is running fine in windows, even using MSYS, Cygwin built tools etc, so I just don't see the issue here. Which is good, because it's also an incentive to get it working.

@thiagojramos
Hi Thiago,
I would be careful to make assumptions on target audience. You have no idea what other fringe developers are trying to do and what they use. In my own case I am extremely hybrid in the sense that I try to use whatever native windows tools primarily, and then extend using near native tools as Cygwin, MSYS, and only WSL as a last resort, in which case I am not happy, because windows hides it's containers in an incompatible and non-portable way. If I have to use WSL, I much rather prefer using a Virtualbox container where I have full control of everything while being able to backup and deployed on a different machine if necessary. I refuse to use Docker, as it completely shields developers from what is going on in the Linux environment, on the OS level, resulting in absurd communication and issues with app developers who doesn't understand basic *nix based OS principles. (Yeah, sure it is very useful for distributing quick tests and solutions, but there it ends, especially if you need to interact with hardware and bare metal.)

@mihaimaruseac
Hi Mihai, Awesome!
Can you point me in the right direction for the compilation process on windows using a nearly latest TF?
My first thought was that this should be possible using MSYS or MinGW64.


Rant Warning:
I then got distracted by Nvidia now saying to use their own Python repositories, installing absurd wrapper packages that then redirects the package repos to their own pypi servers. This without even telling the user, while installing pypi.rc/ini files all over your Windows system. It took me an hour to root out all that nasty crap after, because their uninstaller OC doesn't do anything. 👎 JFC, who the heck was thinking this through? Totally senseless! 😡

Then we have the Conda addicts. And to be perfectly honest, I think they need to wake up from their legacy attitude. Since ~3 years back python has reached astronomical levels of user friendliness and cross platform compatibility. So why the heck are people still using Conda? It's just a bunch of complicated CMD, Posh and Shell wrappers to various python and binaries. Again shielding and complexifying what is actually done behind the scenes (in a python environment!) Please leave conda and come back to the warm python reality. 🐍

@sgkouzias
Copy link

It looks like the issue is assigned to @Venkat6871 . @Venkat6871 could you provide some useful guidance?

@sgkouzias
Copy link

sgkouzias commented Jun 15, 2024

Professor Alvaro Rodriguez writes in a relevant discussion on the TensorFlow Forum:

I have to say, that the drop of GPU support for windows, the lack of documentation and support for cpp, the lack of support and documentation for TensorFlow lite, the lack of TFrecord multi-platform standalone libraries… and so on is simply a strategy that will kill the library long term. Except for very niche projects in large companies.

Other platforms like PyTorch are investing in easy to use multi-platform solutions. If (or when) someone actually puts a solution powerful, stable, easy to access and easy to import and export to other platforms and languages. Private enthusiasts, researchers and academics will drop TensorFlow. And don’t forget that industries rely on specialists who learned in academia and come from research.

In a time of AI revolution, where the technology is more popular than ever, and is being added to literally everything. In my opinion, TensorFlow is neglecting everything outside Python-Linux, dropping an already lacking support for interoperability, and not investing in accessibility.

I’m saying this as a researcher and professor working in a computer science lab in an university. I write this just after investing almost 100 hours trying to simply build TensorFlow-cc to add some basic capabilities to a research project for the European Union, I failed. Also the absolute lack of recent information anywhere about TensorFlow-cc, and the responses I have seen to old threads lets me know most people gave up the same way I’m ready to tell my whole team to abandon TensorFlow and try other solutions.

I have seen others in my lab commenting similar concerns and frustrations. Many of our researchers are already moving away from TensorFlow and soon the whole department will follow.

For context. The Computer Science department is the largest department in my university, and serves the most important IT faculty in the Northwest of Spain.

For us ML is thriving. In addition to the Bachelor degree in Computer Science, where ML is more than present, being the most requested in the entire university. We opened a new one in Data Science, and are opening a new one in Artificial Intelligence. Next year we will be adding new classes and teachers to be able to serve the increasing number of students in two of the three Machine Learning subjects I teach… As far as I know, none of them will learn TensorFlow, none of them are using it in their personal projects and none of them will use it their degrees. They instead will be using Julia, Matlab, OpenCv , PyTorch, Scikit-Learn and other solutions.

Which leaves me to the industry sector. I worked also as a researcher in a public hospital in a project about diabetes, and in a private research center dedicated to laser and manufacturing. They all used TensorFlow, the same my laboratory did. I have been told they are all moving away from it, currently opting for a Scikit-Learn+OpenCv and PyTorch based approach. The reason is in one case the drop of GPU support for windows, and in the other a perceived drop of support combined with lack of interoperability.

The thing is, nobody moves away from a technology they spent years using and learning unless the technology fails them. And once you move away from something because of a problem, if you find a solution somewhere else, you will probably never return.

That is what is happening in TensorFlow. Google has intentionally dropped the ball with support, documentation, accessibility, ease of use, interoperability across languages and interoperability across platforms… so others will raise to the occasion.

Simply put, TensorFlow is becoming the Bing search engine with regards to AI

@mihaimaruseac
Copy link
Collaborator

mihaimaruseac commented Jun 15, 2024

@mihaimaruseac I am relatively a newbie in the world of TensorFlow but I wonder why valuable TensorFlow team members left the team and why in general the number of members was so radically reduced. I really don't get it..

Reorg during pandemic, shift of priorities, old manager and tech lead left, almost no-one who worked on TF pre-2.0 was still left in the team, JAX, Goodhart's law (in multiple instances).

and so on is simply a strategy that will kill the library long term

JAX is what is the future now from Google. Or PyTorch. If you can use Keras 3 (which is not the default within TF -- or it should not be), you should be backend agnostic, but there's still a lot of work left to do to cover everything.

@mihaimaruseac
Copy link
Collaborator

Can you point me in the right direction for the compilation process on windows using a nearly latest TF?

Unfortunately, most files that were there to support compiling on GPU on Windows are no longer in the repo, so there's not really a simple path forward.

@sgkouzias
Copy link

sgkouzias commented Jun 15, 2024

@mihaimaruseac I am relatively a newbie in the world of TensorFlow but I wonder why valuable TensorFlow team members left the team and why in general the number of members was so radically reduced. I really don't get it..

Reorg during pandemic, shift of priorities, old manager and tech lead left, almost no-one who worked on TF pre-2.0 was still left in the team, JAX, Goodhart's law (in multiple instances).

and so on is simply a strategy that will kill the library long term

JAX is what is the future now from Google. Or PyTorch. If you can use Keras 3 (which is not the default within TF -- or it should not be), you should be backend agnostic, but there's still a lot of work left to do to cover everything.

@mihaimaruseac thank you so much for the insightful response. I am really grateful and do appreciate your efforts and generally the TensorFlow & Keras team (those who left the team and those who crafted and maintain the new, more inclusive and powerful Keras). I am a fan of the "progressive disclosure of complexity" core principle of Keras. If JAX is the future I will embrace it.

@eabase
Copy link
Author

eabase commented Jun 17, 2024

@mihaimaruseac

Can you point me in the right direction for the compilation process on windows using a nearly latest TF?

Unfortunately, most files that were there to support compiling on GPU on Windows are no longer in the repo, so there's not really a simple path forward.

I guess what I am saying, that the c/c++ used in the linux (WSL) compilation, should be straight forward to adjust to compile in native windows, unless they try to use native .NET/C#/Windows API calls. Forget all the stupid setup/build scripts, they only confuse everyone. This, of course, as long as there are no hard-coded libraries or other *.so files it relies on. (One should still be able to compile those, separately into dll's.) It's very weird that they don't have a git that shows when this was "removed" and what was "removed".

Anyway, thank you so much for providing the summary and clarity of the situation. Is anyone aware of some summary or methodology of translating legacy TF code into Torch or Keras?

@Venkat6871 Venkat6871 added TF 2.16 subtype:windows Windows Build/Installation Issues type:build/install Build and install issues labels Jun 17, 2024
@Venkat6871
Copy link

@learning-to-play

@Venkat6871 Venkat6871 added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Jun 17, 2024
@mihaimaruseac
Copy link
Collaborator

This, of course, as long as there are no hard-coded libraries or other *.so files it relies on.

This is the main issue. The Bazel files and macros that allowed compiling on GPU on Windows were excised. The macros assume in many places that they only build *.so files (no GPU support on Mac either).

Then, there's an issue that the old build scripts used MSVC + nvcc for CUDA compilation, but now everything is on clang, so the path on Windows + CUDA has not been tested at all.

@GatGit12
Copy link

Linking, relevant: #59918

@eabase
Copy link
Author

eabase commented Jun 27, 2024

@mihaimaruseac
Do you have a good reference to the latest/best (or most complete) build instructions for using clang + CUDA on WSL?

I can try to see if I can reproduce that build under MSYS. (I don't see why it couldn't be done.)

@mihaimaruseac
Copy link
Collaborator

I think running https://www.tensorflow.org/install/source under WSL should work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype:windows Windows Build/Installation Issues TF 2.16 type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

6 participants