Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to LLVM 10 #655

Closed
yuanming-hu opened this issue Mar 25, 2020 · 15 comments
Closed

Upgrade to LLVM 10 #655

yuanming-hu opened this issue Mar 25, 2020 · 15 comments
Assignees
Labels
c++ C++ engineering related welcome contribution
Milestone

Comments

@yuanming-hu
Copy link
Member

yuanming-hu commented Mar 25, 2020

Description
LLVM 10 has been released today. Currently we are using LLVM 8, and I suggest we upgrade yearly to the latest version of LLVM.

Note that LLVM now bumps its major version every 6 months. This means we will actively upgrade to the latest LLVM with even major version numbers (10, 12, 14, ...)

How
Perhaps this task is mostly about making sure that Taichi compiles and passes all tests with LLVM 10.

During the upgrade we should try to support LLVM versions 8 and 10 simultaneously, using the macro LLVM_VERSION_MAJOR (thanks to @sighingnow) and #if (LLVM_VERSION_MAJOR >= 10) etc. for conditional compilation.

I suggest that we drop LLVM 8 support one month after the migration to 10 is materialized, to save maintenance efforts.

@k-ye
Copy link
Member

k-ye commented Mar 28, 2020

Anecdote: After upgrading to XCode 10.14, with Apple clang version 11.0.3 (clang-1103.0.32.29), the current LLVM 8.0.1 stopped working... Concretely, llvm-as threw an error around runtime.ll, something like

runtime.ll:358:3: error: instruction expected to be numbered '%6'

I noticed this because my runtime_x64.bc didn't get updated since March 24, which aligns with XCode 11.4's release date... Tested Taichi HEAD on my older MBP with XCode 11.3.1 + clang 11.0.0, and it's still working.


Downgraded to XCode 11.3.1 and it's working again now.

@yuanming-hu
Copy link
Member Author

yuanming-hu commented Apr 28, 2020

I tried to upgrade the legacy JIT compilation layers (ORCv1) in JITSessionCPU yet had limited success (#885). Note that ORCv1 layers are deprecated in LLVM 9. We need to upgrade to ORCv2.

The main issue I ran into with ORCv2 layers, is that after JITSessionCPU is destroyed, throwing any exceptions in C++ leads to a segmentation fault. Note that Taichi IR transforms need C++ exceptions to jump out of recursive layers of IRVisitor.

It seems to me that the exception handling frames (EHFrames) are somehow corrupted and libunwind crashes:

#0  0x00007f78296026c2 in __GI___waitpid (pid=31431, stat_loc=stat_loc@entry=0x562e685fb978, options=options@entry=0)
    at ../sysdeps/unix/sysv/linux/waitpid.c:30
#1  0x00007f782956d067 in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:149
#2  0x00007f780eb89c66 in taichi::Logger::error (this=<optimized out>, s=..., raise_exception=<optimized out>)
    at /home/yuanming/repos/taichi/taichi/core/logging.cpp:117
#3  0x00007f780eb8a28e in taichi::signal_handler (signo=11) at /home/yuanming/repos/taichi/taichi/core/logging.cpp:161
#4  <signal handler called>
#5  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
#6  <signal handler called>
#7  0x00007f78103d24fe in classify_object_over_fdes () from /tmp/taichi-6adl999p/taichi_core.so
#8  0x00007f78103d2ae8 in search_object () from /tmp/taichi-6adl999p/taichi_core.so
#9  0x00007f78103d3996 in _Unwind_Find_FDE () from /tmp/taichi-6adl999p/taichi_core.so
#10 0x00007f78103cfa73 in uw_frame_state_for () from /tmp/taichi-6adl999p/taichi_core.so
#11 0x00007f78103d0ce0 in uw_init_context_1 () from /tmp/taichi-6adl999p/taichi_core.so
#12 0x00007f78103d1937 in _Unwind_Resume () from /tmp/taichi-6adl999p/taichi_core.so
#13 0x00007f780ed45fad in taichi::lang::LowerAST::visit (this=<optimized out>, stmt=0x562e6a8d5670)
    at /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../../include/c++/7.5.0/ext/new_allocator.h:125
#14 0x00007f780ed44bcb in taichi::lang::LowerAST::visit (this=0x7ffe7296d688, stmt_list=<optimized out>)
    at /home/yuanming/repos/taichi/taichi/transforms/lower_ast.cpp:44
#15 0x00007f780ed44a4e in taichi::lang::LowerAST::run (node=0x562e6aa62d60)
    at /home/yuanming/repos/taichi/taichi/transforms/lower_ast.cpp:396
#16 0x00007f780ed24a3f in taichi::lang::irpass::compile_to_offloads (ir=0x562e6a68f080, config=..., 
    vectorize=<optimized out>, grad=<optimized out>, ad_use_stack=<optimized out>, verbose=<optimized out>, 
    lower_global_access=<optimized out>) at /home/yuanming/repos/taichi/taichi/transforms/compile_to_offloads.cpp:31
#17 0x00007f780ec437dc in taichi::lang::Kernel::lower (this=0x562e6a944a30, lower_access=<optimized out>)
    at /home/yuanming/repos/taichi/taichi/program/kernel.cpp:56
#18 0x00007f780ec4a69f in taichi::lang::Program::compile (this=0x562e6a917c90, kernel=...)
    at /home/yuanming/repos/taichi/taichi/program/program.cpp:141
#19 0x00007f780ec435ea in taichi::lang::Kernel::compile (this=0x562e6a944a30)
    at /home/yuanming/repos/taichi/taichi/program/kernel.cpp:42
#20 0x00007f780ec43c2c in taichi::lang::Kernel::operator() (this=0x562e6a944a30)
    at /home/yuanming/repos/taichi/taichi/program/kernel.cpp:68

To fix this I manually registered the EHFrames:

https://github.com/taichi-dev/taichi/pull/885/files#diff-53699800719af4686810809872c13cc9R98-R101

Now it works more stably, but when tested with multiple threads, it still fails with a small probability. I didn't dig more since I ran out of time... A related discussion (which looks similar but already fixed, so I believe we ran into a slightly different issue): http://lists.llvm.org/pipermail/llvm-dev/2017-May/112547.html

I document the issues here so that someone clever and brave can pick this up and continue the upgrading process. All above are under LLVM 8.0.1 and Ubuntu 18.04. I'm not sure if LLVM 10 has fixed this issue, or it is caused by something in Taichi itself. To be honest, the ORCv2 documentation is quite limited.

@yuanming-hu yuanming-hu mentioned this issue Apr 28, 2020
10 tasks
@k-ye
Copy link
Member

k-ye commented Apr 28, 2020

Note that Taichi IR transforms need C++ exceptions to jump out of recursive layers of IRVisitor.

If exception is the only issue, a somewhat cumbersome work around could be track if we have to terminate earlier in the IR pass visitor on our own.

E.g., instead of writing

try {
  FooPass foo;
  root->accept(&foo);
} catch (IRModified) {
  continue;
}

maybe we can do something like

FooPass foo;
root->accept(&foo);
if (foo.modified()) {
  continue;
}

It's cumbersome since jumping directly to the top scope is a natural fit for IR passes. But given that Google's code base is doing fine with C++ exception completely forbidden, I guess it's not a real blocker...

@archibate archibate added this to the v0.7.0 milestone May 2, 2020
@yuanming-hu
Copy link
Member Author

Right, getting rid of exceptions in IR passes sounds like a potential solution. We can switch IR passes from current find one/modify/throw/restart scheme to find all/modify all/return scheme. This scheme is also used by LLVM and runs more efficiently.

@znah
Copy link
Contributor

znah commented May 9, 2020

This should fix #235 as well.

@yuanming-hu
Copy link
Member Author

This should fix #235 as well.

Right, and clearly we need a lot of work to have this done. This also improves optimizer performance: #926

@TH3CHARLie
Copy link
Collaborator

Anecdote: After upgrading to XCode 10.14, with Apple clang version 11.0.3 (clang-1103.0.32.29), the current LLVM 8.0.1 stopped working... Concretely, llvm-as threw an error around runtime.ll, something like

runtime.ll:358:3: error: instruction expected to be numbered '%6'

I noticed this because my runtime_x64.bc didn't get updated since March 24, which aligns with XCode 11.4's release date... Tested Taichi HEAD on my older MBP with XCode 11.3.1 + clang 11.0.0, and it's still working.

Downgraded to XCode 11.3.1 and it's working again now.

I encounter this today with Xcode 11.5 + clang-1103.0.32.62, with error messages similar to you. It looks like the error from llvm-as causes no bitcode generated at all.

However, this is somehow sad news for people who upgrade their toolchain frequently(like myself).

@k-ye
Copy link
Member

k-ye commented May 25, 2020

+1. I have a pinned XCode command line tools@11.3, and the default one just gets upgraded all the time..

@yuanming-hu
Copy link
Member Author

yuanming-hu commented Jun 10, 2020

Done with local tests for LLVM 10 on Windows 10 and Ubuntu 18.04. TODO (welcome contribution):

  • Test OS X with LLVM 10 (@TH3CHARLie)
  • Upload OS X pre-built LLVM 10 (@TH3CHARLie)
  • Update OS X build bots with prebuilt LLVM 10 (@TH3CHARLie)
  • Upload Windows pre-built LLVM 10 (@yuanming-hu)
  • Update Windows build bots with prebuilt LLVM 10
  • Upload Linux pre-built LLVM 10 (@yuanming-hu)
  • Update Linux build bots with prebuilt LLVM 10
  • Remove LLVM 8 support

Note: the uploaded prebuilt LLVM is stored at https://github.com/taichi-dev/taichi_assets/releases/tag/llvm10. The zip files should directly contain four folders: bin, include, lib, share. Use -DCMAKE_INSTALL_PREFIX=installed when you cmake and find the folders under build/installed.

@JYP2011

This comment has been minimized.

@archibate

This comment has been minimized.

@JYP2011
Copy link

JYP2011 commented Jul 5, 2020

I do not find llvm release 10 in the mirror of llvm in github:-(

Check out this: https://github.com/taichi-dev/taichi_assets/releases

It is the modified llvm version by taichi community?

@archibate

This comment has been minimized.

@JYP2011
Copy link

JYP2011 commented Jul 5, 2020

I do not find llvm release 10 in the mirror of llvm in github:-(

Check out this: https://github.com/taichi-dev/taichi_assets/releases

It is the modified llvm version by taichi community?

Yes, or they won't occur under @taichi-dev.

thanks for your reply:-).
Besides, I used the official llvm 8.0 followed by the taichi document.
It reported errror, specified error report is the following:

CMake Error at /usr/local/share/cmake-3.12/Modules/CMakeTestCXXCompiler.cmake:45 (message):
  The C++ compiler

    "/usr/local/bin/clang++"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /home/nullspace/workspace/taichi/build/CMakeFiles/CMakeTmp
    
    Run Build Command:"/usr/bin/make" "cmTC_14c59/fast"
    /usr/bin/make -f CMakeFiles/cmTC_14c59.dir/build.make CMakeFiles/cmTC_14c59.dir/build
    make[1]: Entering directory '/home/nullspace/workspace/taichi/build/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_14c59.dir/testCXXCompiler.cxx.o
    /usr/local/bin/clang++    -stdlib=libc++    -o CMakeFiles/cmTC_14c59.dir/testCXXCompiler.cxx.o -c /home/nullspace/workspace/taichi/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    Linking CXX executable cmTC_14c59
    /usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_14c59.dir/link.txt --verbose=1
    /usr/local/bin/clang++  -stdlib=libc++     -rdynamic CMakeFiles/cmTC_14c59.dir/testCXXCompiler.cxx.o  -o cmTC_14c59 
    /usr/bin/ld: cannot find -lc++
    clang-8: error: linker command failed with exit code 1 (use -v to see invocation)
    CMakeFiles/cmTC_14c59.dir/build.make:86: recipe for target 'cmTC_14c59' failed
    make[1]: *** [cmTC_14c59] Error 1
    make[1]: Leaving directory '/home/nullspace/workspace/taichi/build/CMakeFiles/CMakeTmp'
    Makefile:121: recipe for target 'cmTC_14c59/fast' failed
    make: *** [cmTC_14c59/fast] Error 2

Can you tell me what's wrong with it?(maybe I should open a new issue?)

@archibate
Copy link
Collaborator

maybe I should open a new issue?

Yes, please. We're willing to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ C++ engineering related welcome contribution
Projects
None yet
Development

No branches or pull requests

6 participants