Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlphaFold 2.3.1 #118

Closed
boegel opened this issue Mar 21, 2023 · 5 comments
Closed

AlphaFold 2.3.1 #118

boegel opened this issue Mar 21, 2023 · 5 comments
Labels
difficulty: easy software that should be easy to support GPU priority: high Python site:ugent Software installation request for UGent Tier-2 update

Comments

@boegel
Copy link
Contributor

boegel commented Mar 21, 2023

@boegel boegel added difficulty: easy software that should be easy to support priority: high Python update site:ugent Software installation request for UGent Tier-2 GPU labels Mar 21, 2023
@boegel
Copy link
Contributor Author

boegel commented Mar 21, 2023

WIP easyconfigs added in 118_AlphaFold

I ran into a problem with the OpenMM dependency, which triggers a compiler crash (ICE, Internal Compiler Error), which seems to boil down to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99746 or https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106841, so we'll need to backport a patch for GCCcore 11.2.0 (if we go with foss/2021b like with AlphaFold 2.3.0) or GCCcore 11.3.0 (if we go with foss/2022a, which should be feasible now that there's a TensorFlow with foss/2022a - see easybuilders/easybuild-easyconfigs#17241)

@boegel
Copy link
Contributor Author

boegel commented Mar 22, 2023

Maybe we should update to OpenMM 7.7.0, see also discussion in google-deepmind/alphafold#404

@lexming
Copy link
Collaborator

lexming commented Apr 11, 2023

I just finished the installation of AlphaFold 2.3.1 in Hydra with foss/2022a. Relevant easyconfigs are in 118_AlphaFold

I also hit that ICE with OpenMM v8.0.0 in GCC 11.3.0, but:

  1. only in the build with CUDA because the failing code path links to CUDA
  2. only fails on Intel AVX2 architectures (i.e. Broadwell). AMD Zen2 (AVX2) is fine.

The ICE is the following:

during GIMPLE pass: vect
/theia/scratch/brussel/vo/000/bvo00005/vsc10122/easybuild/install/broadwell/build/OpenMM/8.0.0/foss-2022a-CUDA-11.7.0/openmm-8.0.0/platforms/common/src/CommonKernels.cpp: In member function void OpenMM::CommonCalcGayBerneForceKernel::sortAtoms():
/theia/scratch/brussel/vo/000/bvo00005/vsc10122/easybuild/install/broadwell/build/OpenMM/8.0.0/foss-2022a-CUDA-11.7.0/openmm-8.0.0/platforms/common/src/CommonKernels.cpp:5055:6: internal compiler error: in vect_get_vec_defs_for_operand, at tree-vect-stmts.c:1450
 5055 | void CommonCalcGayBerneForceKernel::sortAtoms() {
      |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0x69b399 vect_get_vec_defs_for_operand(vec_info*, _stmt_vec_info*, unsigned int, tree_node*, vec<tree_node*, va_heap, vl_ptr>*, tree_node*)
        ../../gcc/tree-vect-stmts.c:1450
0xf42e34 vect_build_gather_load_calls
        ../../gcc/tree-vect-stmts.c:2728
0xf42e34 vectorizable_load
        ../../gcc/tree-vect-stmts.c:8718
0xf4bce0 vect_transform_stmt(vec_info*, _stmt_vec_info*, gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
        ../../gcc/tree-vect-stmts.c:10922
0xf4fa81 vect_transform_loop_stmt
        ../../gcc/tree-vect-loop.c:9254
0xf6744d vect_transform_loop(_loop_vec_info*, gimple*)
        ../../gcc/tree-vect-loop.c:9690
0xf905dc try_vectorize_loop_1
        ../../gcc/tree-vectorizer.c:1104
0xf911c1 vectorize_loops()
        ../../gcc/tree-vectorizer.c:1243
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
make[2]: *** [platforms/cuda/sharedTarget/CMakeFiles/OpenMMCUDA.dir/__/__/common/src/CommonKernels.cpp.o] Error 1

Which raises many questions:

  1. It looks like very much the same as GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99746. However, that bug in GCC is already fixed in GCC v11.3.0. So why is it still happening?
  2. The backtrace shows calls from files that belong to the GCC source code but with relative paths such as ../../gcc/tree-vect-stmts.c:1450. Those paths should be absolute. What are they relative to? OpenMM does not bundle any of this.
  3. The line numbers do not match the calling functions shown in the backtrace for GCC 11.3.0. Not sure if those line numbers refer for the stripped source code instead.

I have the suspicion that the backtrace shown in the ICE is not referring to the GCC compiler used by EasyBuild, but the compiler used by Nvidia in the CUDA pre-built binaries.

Nevertheless, since this ICE is only afecting our older systems, it is not worth the effort on our side to go any deeper. We just disabled the vectorization on the affected installation.

update: I just saw in the EasyBuild PR that a patch was recently added to the OpenMM easyconfig to disable vectorization on the single function that fails. That's a better solution.

@boegel
Copy link
Contributor Author

boegel commented Apr 14, 2023

@lexming This issue can be closed, and the 118_AlphaFold directory can be removed, since easybuilders/easybuild-easyconfigs#17604 is merged?

@lexming
Copy link
Collaborator

lexming commented Apr 17, 2023

@lexming lexming closed this as completed Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: easy software that should be easy to support GPU priority: high Python site:ugent Software installation request for UGent Tier-2 update
Projects
None yet
Development

No branches or pull requests

2 participants