New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
On the way to latest CMake, VS2017, CUDA 9, cudNN 7, Win10 #14801
Comments
Very nice. Thanks. The tensorflow team should just release a official TF1.4 CUDA9 win10 build whl file for its users. I don't get why they don't do so immediately. |
Since I've discovered that despite everything ran fine, the VS2017 (until 15.4) distribution introduced a bug in the /WHOLEARCHIVE trick resulting in unfilled factories (session, device...) as mentioned in. Therefore I am stuck using vs2017 for linking my application with TF. |
@sylvain-bougnoux I don't believe the WHOLEARCHIVE flag is used when building a Python whl, am I mistaken? |
I don’t think so either, it is a matter of linking your app with the TF libraries. Making a whl should work, while making an app is prone to fail.
I have summited this bug to MS but no answer yet.
|
Thanks for the notes @sylvain-bougnoux! Adding @mrry @gunan @tfboyd since they might be interested in your notes on getting things working. As the referenced bugs mention, support for CUDA 9 / cuDNN 7 is anticipated in TensorFlow 1.5. Marking this as "community support" since the purpose of this issue seems to be to collect useful tips in making this all work. |
@gunan |
http://ci.tensorflow.org/job/tf-pr-win-cmake-gpu/19/console
The location of the assertion failure seems to point to a line in nvidia proprietary code. |
As many of us (#14126,#14691,#12052), I am trying to get TF1.4 build successfully on windows using the latest version of everything. As far as I can judge I could do it but with some hacks. As it is too long for me to complete, I would like to share what I did for help finalizing. It is too early for a PR.
I am using CMake 3.9.6 (though 3.10 came out). I have low cmake skill level.
I am not trying the python bindings.
VS2017 is the community edition.
Without GPU it is easy. The only issue is the heap overflow (C1002 or C1006 #11096). The trick is to reduce parallel build by
msbuild /m:4 /p:CL_MPCount=2 ...
such that 4*2 is approximately the number of core you really have (at least it worked for me). Using/Zm2000
did not work for me, despite a lot of available memory (32G).With GPU it is more tricky: the
tf_core_gpu_kernels.vcxproj
does not compile at all. AFAIU, the CMake strategy changed from v3.6, to allow parallel computing. CUDA is now treated as another language. Without modifications nvcc simply returns with code error 1 (or nothing happen I am not sure). Here are my modifications (from v1.4).From
tensorflow/tensorflow/contrib/cmake/
1/ adapt
cmakelists.txt
a little:CUDA 8.0
toCUDA 9.0
l.223.enable_language("CUDA")
l.224.set(CUDA_NVCC_FLAGS ...)
directives do not work anymore. See below.64_80
to64_90
and64_6
to64_7
l.247 and 248, similarly in l.272-276.2/ in
tf_core_kernels.cmake
:set_source_files_properties(${tf_core_gpu_kernels_srcs} PROPERTIES LANGUAGE CUDA)
to recognize '.cu.cc' extensions as cuda files in l.209.cuda_add_library(...)
asadd_library(...)
l.210.3/ edit (this is the trick)
tf_core_gpu_kernels.vcxproj
, in the release section:/bigobj /nologo ... -Ob2
with the-Xcompiler="/bigobj ... -Ob2"
directive l.147. These former flags are for the c++ compiler not for nvcc and result in the crash.--expt-relaxed-constexpr
, still in theAdditionalOptions
.PerformDeviceLink
fromfalse
totrue
l.164.Then everything compile (msbuild on tf_tutorials_example_trainer.vcxproj) (and this tuto works). The remaining point before PR is to avoid third step, i.e. give the right directives to nvcc, by understanding how the CUDA_NVCC_FLAGS works, and add the linking. Hope this solution will work without missing symbols (#6396).
Otherwise it is a nightmare: both CUDA 8 and CMake 3.6 are not aware of VS2017. CMake compilation is not incremental (#14194) and takes about 4-5H (could use precompiled headers especially in tf_core_kernels)...
The text was updated successfully, but these errors were encountered: