Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSVC Compiler 19.22.xxxxx generates invalid code which leads to inability to build libuserland #3024

Closed
dimenus opened this issue Aug 7, 2019 · 8 comments
Labels
os-windows stage1 The process of building from source via WebAssembly and the C backend. upstream An issue with a third party project that Zig uses.
Milestone

Comments

@dimenus
Copy link
Contributor

dimenus commented Aug 7, 2019

Related to #2740

With the upgrade from 19.21 (VS 2017) to 19.22, Microsoft seems to have broken something related to our FNV hash implementation. Building libuserland leads to this error:

invalid builtin function: 'hasDecl'

The workaround is to use VS 2017 or build tools older than 19.22.

@dimenus dimenus changed the title MSVC 19.22.xxxxx generates invalid code which leads to inability to build libuserland MSVC Compiler 19.22.xxxxx generates invalid code which leads to inability to build libuserland Aug 7, 2019
@andrewrk andrewrk added bug Observed behavior contradicts documented or intended behavior os-windows stage1 The process of building from source via WebAssembly and the C backend. labels Aug 7, 2019
@andrewrk andrewrk added this to the 0.5.0 milestone Aug 7, 2019
@andrewrk andrewrk added upstream An issue with a third party project that Zig uses. and removed bug Observed behavior contradicts documented or intended behavior labels Aug 7, 2019
@mikdusan
Copy link
Member

mikdusan commented Aug 7, 2019

going to put this here for posterity:

STATUS CL.EXE VS CMAKE INLINE
19.21.27702 2019 16.0 Release /Ob2
19.22.27905 2019 16.2 Release /Ob2
MinSizeRel /Ob1
Release /Ob1 (hacked)
19.23.28105.4 2019 16.3.0-pre.4.0 MinSizeRel /Ob1
Release /Ob2
2019 16.3.2 MinSizeRel /Ob1
Release /Ob2
19.24.28117.0 2019 16.4.0-pre.1.0 MinSizeRel /Ob1
Release /Ob2
19.24.28314.0 2019 16.4.2 MinSizeRel /Ob1
Release /Ob2
2019 16.5.0-pre.1.0 MinSizeRel /Ob1
Release /Ob2
19.24.28316.0 2019 16.4.4 Release /Ob2
19.25.28508.3 2019 16.5.0-pre.2.0 Release /Ob2

@XVilka
Copy link

XVilka commented Sep 13, 2019

Can it be related to the optimizations? In radare2 we met with memmove optimization bug: https://developercommunity.visualstudio.com/content/problem/583227/vs-2019-cl-1921277022-memmove-instrinsic-optimizat.html

@andrewrk
Copy link
Member

Based on @XVilka's link, that could very well be our issue. Microsoft has acknowledged the bug and so eventually Azure will be updated with a patched MSVC. Until then I am testing the MinSizeRel workaround.

@andrewrk
Copy link
Member

andrewrk commented Sep 26, 2019

The CI is using MinSizeRel to work around this issue. Thanks @mikdusan.

I'm not actually able to reproduce it locally. Let's keep this issue open until the fix is widely available.

@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Sep 26, 2019
@Sahnvour
Copy link
Contributor

Sahnvour commented Oct 2, 2019

I investigated this with

$ cl.exe
Microsoft (R) C/C++ Optimizing Compiler Version 19.23.28105.4 for x64

at 17f2af10b56dd508ebd4a22834fd594cb5a49864.

The root cause is that building the global builtin function table corrupts memory while adding entries. They will then later not be available when trying to access them.

The bug only occurs with CMake's Release target, so in order to get debug symbols I had to hack a bit using https://gitlab.kitware.com/cmake/community/wikis/FAQ#make-override-files. In my case I used a RelWithDebInfo and overriding flags to be sure to have /O2 /Ob2.

set(CMAKE_USER_MAKE_RULES_OVERRIDE_CXX
   ${CMAKE_CURRENT_SOURCE_DIR}/cxx_flag_overrides.cmake)
if(MSVC)
    set(CMAKE_CXX_FLAGS_RELWITHDEBINFO_INIT "/MD /Zi /O2 /Ob2 /D NDEBUG")
endif()

Building target libuserland (with the command zig0.exe build --override-lib-dir <path>/<to>/zig/lib libuserland install -Doutput-dir=<path>/<to>/zig/build -Drelease=true -Dlib-files-only --prefix <path>/<to>/Install/zig) will then fail.

So I launched it via the debugger. Now, while running define_builtin_fns from codegen.cpp, we can observe g->builtin_fn_table._entries being corrupted (it's a sparse array in memory) when executing create_builtin_fn(..., "errorReturnTrace");.

The disasm is

                  create_builtin_fn(g, BuiltinFnIdErrorReturnTrace, "errorReturnTrace", 0);
00007FF625EE5219  lea         edx,[r14+28h]  
00007FF625EE521D  lea         ecx,[rdx-27h]  
00007FF625EE5220  call        qword ptr [__imp_calloc (07FF62C995030h)]  
00007FF625EE5226  mov         rsi,rax  
00007FF625EE5229  test        rax,rax  
00007FF625EE522C  je          define_builtin_fns+2999h (07FF625EE5B59h)  
00007FF625EE5232  lea         rbx,[rax+8]  
00007FF625EE5236  mov         qword ptr [rbp+28h],rax  
00007FF625EE523A  mov         rcx,rbx  
00007FF625EE523D  lea         r8d,[r14+10h]  
00007FF625EE5241  lea         rdx,[string "errorReturnTrace" (07FF62A907A28h)]  
00007FF625EE5248  call        buf_init_from_mem (07FF625EDCE10h)  
00007FF625EE524D  lea         r8,[rbp+28h]  
00007FF625EE5251  mov         qword ptr [rbp+30h],rbx  
00007FF625EE5255  lea         rdx,[rbp+30h]  
00007FF625EE5259  mov         rcx,rdi  
00007FF625EE525C  mov         dword ptr [rax],65h  
00007FF625EE5262  mov         qword ptr [rax+20h],r14  
00007FF625EE5266  call        HashMap<Buf * __ptr64,BuiltinFnEntry * __ptr64,&buf_hash,&buf_eql_buf>::put (07FF625DA17F2h)  

This is basically create_builtin_fn inlined, see we're creating the builtin entry, its name and inserting it into the hashmap.
Near the end, at mov dword ptr [rax],65h the memory pointed to by builtin_fn->name.list.items (rax) is stomped and so the key for the hashmap is garbage.

If you look at the asm generated for the previous builtins, the part after the call to buf_init_from_mem is nearly identical, and then changes slightly starting from "errorReturnTrace".

I'm pretty sure this is a codegen bug, but maybe not the one linked above. An easy workaround is to surround one of these two functions with #pragma optimize("", off/on) and the bug will disappear. I guess splitting defined_builtin_fns in two might also help. Anyways we should probably file a bug report to Microsoft.

@Sahnvour
Copy link
Contributor

Sahnvour commented Feb 17, 2020

Tested this again with

set(CMAKE_CXX_FLAGS_RELEASE_INIT
        "/MT /O2 /Ob2 /D NDEBUG")

and

set(CMAKE_CXX_FLAGS_RELEASE_INIT
        "/MT /O2 /Ob3 /D NDEBUG")

(Ob3 is a new flag in VS 2019) on compilers 19.24.28314 and 19.24.28316 without walking into the issue anymore. Maybe we can close this (I'll make a PR to reset the optimization flags to default) ?

@mikdusan
Copy link
Member

mikdusan commented Feb 18, 2020

@Sahnvour,

compilers 19.24.28314 and 19.24.28316

Something odd here. At my last update I tested 19.24.28314 and the issue remained. This was building against the same LLVM binary bundle used by CI.

I speculated that maybe this issue can be resolved by rebuilding LLVM with new compiler. Unfortunately I ran into difficulties rebuilding LLVM on my system and left that unresolved.

Are you using your own LLVM dependency?

update: 19.24.28316.0 and 19.25.28508.3 work for me with Release and /Ob2 so assuming CI is running minimal version we should be fine going back to /Ob2.

@andrewrk andrewrk modified the milestones: 0.7.0, 0.8.0 Oct 17, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.8.1 Jun 4, 2021
@andrewrk andrewrk modified the milestones: 0.8.1, 0.9.1 Sep 1, 2021
@andrewrk andrewrk modified the milestones: 0.9.1, 0.9.0, 0.11.0 Nov 20, 2021
@andrewrk
Copy link
Member

This issue no longer plagues us since we use the wasm kernel for building from source rather than C++ code.

@andrewrk andrewrk modified the milestones: 0.12.0, 0.11.0 Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os-windows stage1 The process of building from source via WebAssembly and the C backend. upstream An issue with a third party project that Zig uses.
Projects
None yet
Development

No branches or pull requests

5 participants