Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C1002 error when building on Windows 10 64 bit, with vs 2017 #11096

Closed
davidshen84 opened this issue Jun 28, 2017 · 29 comments
Closed

C1002 error when building on Windows 10 64 bit, with vs 2017 #11096

davidshen84 opened this issue Jun 28, 2017 · 29 comments
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:build/install Build and install issues

Comments

@davidshen84
Copy link

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

If you open a GitHub issue, here is our policy:

  1. It must be a bug or a feature request.
  2. The form below must be filled out.
  3. It shouldn't be a TensorBoard issue. Those go here.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): N/A

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 64 bit, version 1511

  • TensorFlow installed from (source or binary): source

  • TensorFlow version (use command below): master branch, commit 90b2a38

  • Bazel version (if compiling from source): N/A

  • CUDA/cuDNN version: N/A

  • GPU model and memory: CPU only

  • Exact command to reproduce:

    C:\cmake-3.9.0-rc4-win64-x64\bin\cmake.exe .. -G "Visual Studio 15 2017 Win64" ^
    -DCMAKE_BUILD_TYPE=Release -DSWIG_EXECUTABLE=C:\swigwin-3.0.12\swig.exe ^
    -DPYTHON_EXECUTABLE=C:\Users\x\.conda\envs\tensorflow\python.exe ^
    -DPYTHON_LIBRARIES=C:\Users\x\.conda\envs\tensorflow\libs\python35.lib ^
    -Dtensorflow_BUILD_CC_TESTS=ON ^
    -Dtensorflow_WIN_CPU_SIMD_OPTIONS=/arch:AVX
    
    C:\cmake-3.9.0-rc4-win64-x64\bin\cmake.exe --build .
    

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

Describe the problem clearly here. Be sure to convey here why it's a bug in TensorFlow or a feature request.

Issued above build command, and got the following error at last.

"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\ALL_BUILD.vcxproj" (default target) (1) ->
"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\_beam_search_ops.vcxproj" (default target) (3) ->
"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\pywrap_tensorflow_internal.vcxproj" (default target)
(4) ->
"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\pywrap_tensorflow_internal_static.vcxproj" (default t
arget) (5) ->
"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\tf_core_kernels.vcxproj" (default target) (108) ->
(ClCompile target) ->
  c:\users\x\github\tensorflow\tensorflow\contrib\cmake\build\external\eigen_archive\eigen\src\core\products\gener
alblockpanelkernel.h(2011): fatal error C1002: compiler is out of heap space in pass 2 [C:\Users\X\github\tensorfl
ow\tensorflow\contrib\cmake\build\tf_core_kernels.vcxproj]
  cl : Command line error D8040: error creating or communicating with child process [C:\Users\X\github\tensorflow\
tensorflow\contrib\cmake\build\tf_core_kernels.vcxproj]

86 Warning(s)
2 Error(s)

For the warnings, there are two kinds:

  C:\Users\X\github\tensorflow\tensorflow\c\c_api.cc(1938): warning C4190: 'TF_NewWhile' has C-linkage specified,
but returns UDT 'TF_WhileParams' which is incompatible with C [C:\Users\X\github\tensorflow\tensorflow\contrib\cma
ke\build\tf_test_lib.vcxproj]

and

  c:\users\X\github\tensorflow\tensorflow\core\kernels\eigen_spatial_convolutions.h(724): warning C4789: buffer ''
 of size 8 bytes will be overrun; 32 bytes will be written starting at offset 0 [C:\Users\X\github\tensorflow\tens
orflow\contrib\cmake\build\tf_core_kernels.vcxproj]

I saw there're many complains about compiler is out of heap space in pass 2 error, and some say adding "/Zm2000" to the compiler would solve the problem. I applied this patch:

@@ -78,6 +78,8 @@ if(WIN32)
   set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /D_ITERATOR_DEBUG_LEVEL=0")
   set(CMAKE_CXX_FLAGS_MINSIZEREL "${CMAKE_CXX_FLAGS_MINSIZEREL} /D_ITERATOR_DEBUG_LEVEL=0")
   set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} /D_ITERATOR_DEBUG_LEVEL=0")
+  # Increase heap size
+  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /Zm2000")

But did not solve this problem.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

@davidshen84
Copy link
Author

My computer has 16 GB of memory, and during the course of the compilation, the memory usage is below 55%.

@aluo-x
Copy link

aluo-x commented Jun 29, 2017

This issue seems to be related to 88a6cde and a solution can be found at #9470.

The error only occurs if you try to compile tf with CUDA support and AVX optimizations enabled.
Try removing

lines [109 to 114] from reduction_ops_gpu.cu.cc

lines [41 to 42] from reduction_ops_mean.cc

lines [42 to 43] from reduction_ops_prod.cc

lines [41 to 42] from reduction_ops_sum.cc

Edit: Misread and thought this was CUDA related. However the builds below should still be valid.

Additionally I also provide builds from my repo aluo-x/tensorflow_windows

@ali01 ali01 added type:build/install Build and install issues stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Jun 30, 2017
@ali01
Copy link

ali01 commented Jun 30, 2017

@mrry, could you take a quick look?

@GPSnoopy
Copy link

GPSnoopy commented Jul 2, 2017

HI @davidshen84 ,

This may or may not help you.

I've encountered several heap errors with VS2015 when trying to build tensorflow. The problem was that I was using the 32bits compiler instead of the 64bits one. This is mentioned as one of the first step in
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/cmake (although their path does not exist for me).

You can easily check by looking at the Task Manager when compiling. In Windows 10, go to the Details tab and add the Platform column. See if the CL.EXE instances are 32bits or 64bits.

In order to use the 64bits CL.EXE, I had to start MSBUILD on the command line via "VS2015 x64 Native Tools Command Prompt". I haven't tried VS2017 yet, but have a look at "x64 Native Tools Command Prompt for VS 2017" in your start menu.

Tell us if that works.

PS. I am now stuck with the CUDA compiler error mentioned above by @aluo-x.

@davidshen84
Copy link
Author

I am sure I was using the Native 64 Command Tools. I also double confirmed by:

cl /?
Microsoft (R) C/C++ Optimizing Compiler Version 19.10.25019 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

		     C/C++ COMPILER OPTIONS

More over, I used the Visual Studio 15 2017 Win64 cmake generator. I think the compilation process won't be able to start if the compiler is 32 bit.

Thanks.

@davidshen84
Copy link
Author

I tried not using SIMD options, and I still get similar errors

"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\ALL_BUILD.vcxproj"
 (default target) (1) ->
"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\_beam_search_ops.v
cxproj" (default target) (3) ->
"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\pywrap_tensorflow_
internal.vcxproj" (default target) (4) ->
"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\pywrap_tensorflow_
internal_static.vcxproj" (default target) (5) ->
"C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\tf_core_kernels.vc
xproj" (default target) (108) ->
(ClCompile target) ->
  c:\users\x\github\tensorflow\tensorflow\contrib\cmake\build\external\eigen_ar
chive\eigen\src\core\products\generalblockpanelkernel.h(2011): fatal error C1002: co
mpiler is out of heap space in pass 2 [C:\Users\X\github\tensorflow\tensorflow\
contrib\cmake\build\tf_core_kernels.vcxproj]
  c:\users\x\github\tensorflow\tensorflow\contrib\cmake\build\external\eigen_ar
chive\eigen\src\core\products\generalblockpanelkernel.h(2011): fatal error C1002: co
mpiler is out of heap space in pass 2 [C:\Users\X\github\tensorflow\tensorflow\
contrib\cmake\build\tf_core_kernels.vcxproj]
  cl : Command line error D8040: error creating or communicating with child process
[C:\Users\X\github\tensorflow\tensorflow\contrib\cmake\build\tf_core_kernels.vc
xproj]

78 Warning(s)
3 Error(s)

Maybe it is a problem with vs 2017 compiler, not with tensorflow code?

@aluo-x
Copy link

aluo-x commented Jul 4, 2017

So after testing my successful CPU only AVX enabled build using VS 2015, I have found that it is seriously broken. Specifically I believe there is something with batchnorm. There is a discussion here. Tensorflow throws errors in a number of different ways non-deterministically. The exact same code runs fine with batchnorm removed, or with the batchnorm layer but on a stock build.

Extra note, my CPU only builds are built using unmodified code, unlike my GPU builds which contain modifications to work around the imaginary number error.

@davidshen84
Copy link
Author

I just tried building with VS 2015, got the same error...maybe it is the master branch that is broken? I will try tag v1.2.

@davidshen84
Copy link
Author

Finally, I got the build pass. I think the problem is not related to the AVX2 option, but the -Dtensorflow_BUILD_CC_TESTS=ON option.

I guess some of the unit tests are really complicate and relies on features only available on gcc.

It would be nice to verify the build with unit test on Windows, but I do not think it is of high priority. :)

@aluo-x
Copy link

aluo-x commented Jul 6, 2017

@davidshen84 Could you try running the code I linked to here, but with the batch norm layer. My builds were successful, but did not have correct behavior when running batch norm.

@davidshen84
Copy link
Author

@aluo-x , I uncommented your code and used

    h_flat = tf.reshape(h_norm4, [-1, 28 * 28 * 16])

The training is slow, but no error so far.

But in terms of normalization, I think you should have a separated step to normalize all your data at once and save it some where; then use the normalized data, rather than normalize the data as they are loaded.

@aluo-x
Copy link

aluo-x commented Jul 11, 2017

The code was meant to be a functional test. Could you give the full configuration for your build? VS version, python version, commands used etc.

@davidshen84
Copy link
Author

VS version

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC>msbuild /version
Microsoft (R) Build Engine version 14.0.25420.1
Copyright (C) Microsoft Corporation. All rights reserved.

14.0.25420.1

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC>cl
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

cmake command

C:\cmake-3.9.0-rc4-win64-x64\bin\cmake.exe .. -G "Visual Studio 15 2017 Win64" ^
-DCMAKE_BUILD_TYPE=Release -DSWIG_EXECUTABLE=C:\swigwin-3.0.12\swig.exe ^
-DPYTHON_EXECUTABLE=C:\Users\User\.conda\envs\tensorflow\python.exe ^
-DPYTHON_LIBRARIES=C:\Users\User\.conda\envs\tensorflow\libs\python35.lib ^
-Dtensorflow_WIN_CPU_SIMD_OPTIONS=/arch:AVX

python version

Python 3.5.3 :: Continuum Analytics, Inc.

Command used to build

msbuild /p:Configuration=Release all_project.vcxproj
msbuild /p:Configuration=Release tf_python_build_pip_package.vcxproj

After that, I created a python virtual environment and installed the output whl file.

@aluo-x
Copy link

aluo-x commented Jul 13, 2017

Thank you so much. The CPU AVX build using VS2017 and Python 3.5.3 from Anaconda seems to working fine. MSVC failed a few times with C1060 out of heap space, consistently with cwise_op_*, but eventually it worked.

GPU + AVX build is still failing with VS 2017 with internal compiler errors (compared to VS 2015 which created partially functional builds).

Edit:
CPU AVX Build here: https://github.com/aluo-x/tensorflow_windows

@davidshen84
Copy link
Author

@aluo-x In the link you given, you use VS 2015 and VS 2017 interchangeably. Could you please check? Because I could not get the build work with VS 2017.

@aluo-x
Copy link

aluo-x commented Jul 14, 2017

To clarify I am using VS 2017. The CPU build failed the first few times but eventually worked. GPU version is still not working (either internal compiler error or 2nd pass out of heap space).

In contrast to VS 2015, where both the CPU and GPU version were able to be built without error first go.
However I have noted before that my builds using Intel distribution for Python and VS 2015 builds were corrupt. I think the Intel distribution is probably at fault here, but haven't had to to check due to the long build times.

@kalengi
Copy link

kalengi commented Oct 14, 2017

I followed the steps by @davidshen84 with a few differences:

  1. cmake command:
cmake .. -G "Visual Studio 15 2017 Win64" ^
-DCMAKE_BUILD_TYPE=Release ^
-DSWIG_EXECUTABLE=C:/tools/swigwin-3.0.12/swig.exe ^
-DPYTHON_EXECUTABLE=C:/Python36/python.exe ^
-DPYTHON_LIBRARIES=C:/Python36/libs/python36.lib ^
-Dtensorflow_WIN_CPU_SIMD_OPTIONS=/arch:AVX
  1. Command used to build:
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\amd64\MSBuild.exe" /m:2 /p:CL_MPCount=1 /p:Configuration=Release /p:Platform=x64 /p:PreferredToolArchitecture=x64 ALL_BUILD.vcxproj

"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\amd64\MSBuild.exe" /m:2 /p:CL_MPCount=1 /p:Configuration=Release /p:Platform=x64 /p:PreferredToolArchitecture=x64 tf_python_build_pip_package.vcxproj
  • Despite being in the 64-bit VS 2017 command line, I still had to explicitly run the 64-bit version of MSBuild. The default is 32-bit.
  • I used /m:2 and /p:CL_MPCount=1 to limit the number of parallel compilations running in order to avoid running out of heap space.
  • I'm not sure /p:Platform=x64 and /p:PreferredToolArchitecture=x64 added any value, but had them in there anyway to explicitly emphasize 64-bit tools.

@chrisplyn
Copy link

@kalengi Did you successfully build the CPU-only version? Did you try building the GPU-version?

@kalengi
Copy link

kalengi commented Oct 15, 2017

I built the CPU-only version successfully on Windows 7. Haven't tried building the GPU version yet.

@chrisplyn
Copy link

@aluo-x Building GPU version is not working for me as well. Did you have any solutions?

@chrisplyn
Copy link

 7>Done Building Project "D:\tensorflow\tensorflow\contrib\cmake\build\tf_core_framework.vcxproj" (default targets) -- FAILED.
 6>Done Building Project "D:\tensorflow\tensorflow\contrib\cmake\build\tf_cc_framework.vcxproj" (default targets) -- FAILED.
48>Done Building Project "D:\tensorflow\tensorflow\contrib\cmake\build\tf_cc_ops.vcxproj" (default targets) -- FAILED.
47>Done Building Project "D:\tensorflow\tensorflow\contrib\cmake\build\tf_cc_while_loop.vcxproj" (default targets) -- FAILED.
 5>Done Building Project "D:\tensorflow\tensorflow\contrib\cmake\build\tf_c.vcxproj" (default targets) -- FAILED.

Did anyone has similar error messages?

@aluo-x
Copy link

aluo-x commented Oct 15, 2017

@chrisplyn I did not attempt a build of 1.3.0 because #11865 never got a reply. I will try again once 1.4.0 gets released with cudnn 7 compatibility.
Existing successful builds can be found in my repo.

@chrisplyn
Copy link

@aluo-x I can't even build the CPU version with AVX enable, do you know what might cause that?

@chrisplyn
Copy link

@kalengi I can't build CPU version, I encountered can't open pywrap_lib error ...

@kalengi
Copy link

kalengi commented Oct 16, 2017

@chrisplyn By the time you see the message 7>Done Building Project "D:\tensorflow\tensorflow\contrib\cmake\build\tf_core_framework.vcxproj" (default targets) -- FAILED. the corresponding error will have been displayed long before. Try logging the build output messages to a file so that you can search later for the actual error that caused the build to fail. Using Powershell you can view the messages on screen AND output to log file like this:

powershell "& 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\amd64\MSBuild.exe' /p:Configuration=Release ALL_BUILD.vcxproj | tee 'C:\logs\tensorflow_build.log' "

@aluo-x
Copy link

aluo-x commented Nov 13, 2017

@chrisplyn @kalengi I have 1.4.0 AVX2 (VS2017), Python 3.6.3 builds on my repo here.

@tensorflowbutler
Copy link
Member

It has been 14 days with no activity and the awaiting tensorflower label was assigned. Please update the label and/or status accordingly.

1 similar comment
@tensorflowbutler
Copy link
Member

It has been 14 days with no activity and the awaiting tensorflower label was assigned. Please update the label and/or status accordingly.

@davidshen84
Copy link
Author

Looks like @aluo-x has figured out a solution. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests

7 participants