Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The --CPU flag is not helping with speed #1623

Closed
bbitarello opened this issue Jun 28, 2023 · 8 comments
Closed

The --CPU flag is not helping with speed #1623

bbitarello opened this issue Jun 28, 2023 · 8 comments

Comments

@bbitarello
Copy link

bbitarello commented Jun 28, 2023

Hello,
I am trying to run fel with 1,000 bootstraps and I wanted to make this go faster in a mini-server (Ubuntu) with two nodes and 40 CPUs.

Running lscpu to get some system specs I get:

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 40 On-line CPU(s) list: 0-39 Thread(s) per core: 2 Core(s) per socket: 10 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name:Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz Stepping: 7 CPU MHz: 1000.084 CPU max MHz: 3200.0000 CPU min MHz: 1000.0000 BogoMIPS: 4800.00 Virtualization: \VT-x L1d cache: 640 KiB L1i cache: 640 KiB L2 cache: 20 MiB L3 cache: 27.5 MiB NUMA node0 CPU(s): 0-9,20-29 NUMA node1 CPU(s): 10-19,30-39
All the tests below were run on a dataset consisting of one gene (~ 300 codons) and 54 primate species.

  1. I ran fel without CIs and bootstraps, which took 1 min and 24 seconds.

Command:
hyphy FEL --code Universal --alignment "${alg_file}" --tree "${tree_file}" --branches All --srv Yes --pvalue 0.1 --ci No --output "${json_file}" --precision standard --full-model Yes

  1. I ran fel with CIs and 20 bootstraps, using 1 CPU. It took 21min 19 seconds

Command:
hyphy FEL --code Universal --alignment "${alg_file}" --tree "${tree_file}" --branches All --srv Yes --pvalue 0.1 --ci Yes --resample 20 --output "${json_file}" --precision standard --full-model Yes --CPU 1

  1. I ran fel with CIs and 20 bootstraps, using 2 CPUs. It took 24min 40 seconds

Command:
hyphy FEL --code Universal --alignment "${alg_file}" --tree "${tree_file}" --branches All --srv Yes --pvalue 0.1 --ci Yes --resample 20 --output "${json_file}" --precision standard --full-model Yes --CPU 2

  1. I ran fel with CIs and 20 bootstraps, using 4 CPUs. It took 23 min 22seconds

hyphy FEL --code Universal --alignment "${alg_file}" --tree "${tree_file}" --branches All --srv Yes --pvalue 0.1 --ci Yes --resample 20 --output "${json_file}" --precision standard --full-model Yes --CPU 4

  1. I ran the same as 4), but using the m flag and then running grep benchmark messages.log. Result:
    Auto-benchmarked an optimal number (1) of threads. Auto-benchmarked an optimal number (4) of threads. Auto-benchmarked an optimal number (5) of threads. Auto-benchmarked an optimal number (3) of threads. Auto-benchmarked an optimal number (7) of threads. Auto-benchmarked an optimal number (4) of threads. Auto-benchmarked an optimal number (7) of threads.

Questions:

  1. Changing the --CPU value does not improve speed. Am I doing something wrong?
  2. Given the server described above, Is there another way to make this go faster?

Thank you!

@stevenweaver
Copy link
Member

Dear @bbitarello,

Can you first clean your directory rm CMakeCache.txt, then run cmake . and post the output here?
We need to first make sure that you built with OpenMP and/or OpenMPI libraries.

Best,
Steven

@bbitarello
Copy link
Author

bbitarello commented Jun 28, 2023

Hello,
thank you.
I installed this version using conda install hyphy. Do I need to install it from source then?

@stevenweaver
Copy link
Member

Dear @bbitarello,

When I install with conda, I don't see OpenMP libraries being loaded.

See the difference here:

(base) sweaver:hyphy/ strace ./hyphy 2>&1 | grep libomp                                                                                                                                                                                   [19:31:33]
open("/usr/local/lib/libomp.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/beast/2.5.0/lib/libomp.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/scyld/maui/lib/libomp.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/scyld/slurm/lib64/libomp.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/scyld/openmpi/3.1.6/gnu/lib/libomp.so", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/opt/AMD/aocc-compiler-4.0.0/lib/libomp.so", O_RDONLY|O_CLOEXEC) = 3
^C
(base) sweaver:hyphy/ strace ~/miniconda3/bin/hyphy 2>&1 | grep libomp                                                                                                                                                                    [19:31:41]

The hyphy team didn't write nor actively maintains the conda recipe, as evidenced by its commit history. A bug report can be reported here.

I can help more if installed from source.

Best,
Steven

@bbitarello
Copy link
Author

Thank you. I think my previous installed version (which was very old) had been installed form source, and it was faster.

Trying this now:
git clone https://github.com/veg/hyphy.git

cd hyphy

cmake -DCMAKE_INSTALL_PREFIX:PATH=/opt/bbita/hyphy

All good. Ran the tests before running make:

make test

All the tests are successful except fel:

9/20 Test #9: FEL ..............................***Failed 62.42 sec

So I went to check the output of those tests (~/hyphy/Testing/Temporary/LastTest.log), looked for the fel test, and found:

### ** Found _13_ sites under pervasive positive diversifying and _28_ sites under negative selection at p <= 0.1**
Error: Failed to correctly classify site 144 in call to assert(0, "Failed to correctly classify site "+(site+1));

Any ideas? This seems very strange. It works up to a certain point and fails at this specific site.

I appreciate your help. Best regards

@bbitarello
Copy link
Author

bbitarello commented Jun 29, 2023

I just realized this error I am getting above was described by someone else here. #1585

@bbitarello
Copy link
Author

I went back and decided to start over by making triple sure I had all the pre-reqs installed (yes).

It goes well until the "make test" phase, as before. (but, also, I noticed a feel "failed" before that regarding neon and avx extensions - what are these?).

I appreciate your help

`#starting over with
git clone https://github.com/veg/hyphy.git
cd hyphy
cmake .
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test HAVE_AVX_EXTENSIONS
-- Performing Test HAVE_AVX_EXTENSIONS - Failed
-- Performing Test HAVE_SSE4_EXTENSIONS
-- Performing Test HAVE_SSE4_EXTENSIONS - Success
-- Performing Test HAVE_NEON_EXTENSIONS
-- Performing Test HAVE_NEON_EXTENSIONS - Failed
Set default compiler flags to -fsigned-char -O3 -D_FORTIFY_SOURCE=2 -Wall -std=c++14 -g -msse4.1
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found CURL: /usr/lib/x86_64-linux-gnu/libcurl.so (found version "7.68.0")
/usr/lib/x86_64-linux-gnu/libcurl.so
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found suitable version "1.2.11", minimum required is "1.2.9")
-- Found MPI_C: /opt/openmpi/4.0.4_gnu/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /opt/openmpi/4.0.4_gnu/lib/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-fsigned-char -O3 -D_FORTIFY_SOURCE=2 -Wall -std=c++14 -g -msse4.1 -Wl,-rpath -Wl,/opt/openmpi/4.0.4_gnu/lib -Wl,--enable-new-dtags -pthread -fopenmp
-- Configuring done
-- Generating done
-- Build files have been written to: /home/bbitarello/hyphy
sudo cmake --install . --prefix /opt/bbita/hyphy
06:55:36 bbitarello@rosalind hyphy ±|master ✗|→ make hyphy
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test HAVE_AVX_EXTENSIONS
-- Performing Test HAVE_AVX_EXTENSIONS - Failed
-- Performing Test HAVE_SSE4_EXTENSIONS
-- Performing Test HAVE_SSE4_EXTENSIONS - Success
-- Performing Test HAVE_NEON_EXTENSIONS
-- Performing Test HAVE_NEON_EXTENSIONS - Failed
Set default compiler flags to -fsigned-char -O3 -D_FORTIFY_SOURCE=2 -Wall -std=c++14 -g -msse4.1
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Found CURL: /usr/lib/x86_64-linux-gnu/libcurl.so (found version "7.68.0")
/usr/lib/x86_64-linux-gnu/libcurl.so
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found suitable version "1.2.11", minimum required is "1.2.9")
-- Found MPI_C: /opt/openmpi/4.0.4_gnu/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /opt/openmpi/4.0.4_gnu/lib/libmpi.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-fsigned-char -O3 -D_FORTIFY_SOURCE=2 -Wall -std=c++14 -g -msse4.1 -Wl,-rpath -Wl,/opt/openmpi/4.0.4_gnu/lib -Wl,--enable-new-dtags -pthread -fopenmp
-- Configuring done
-- Generating done
-- Build files have been written to: /home/bbitarello/hyphy
Scanning dependencies of target hyphy
[ 0%] Building CXX object CMakeFiles/hyphy.dir/src/core/_hyExecutionContext.cpp.o
[ 0%] Building CXX object CMakeFiles/hyphy.dir/src/core/alignment.cpp.o
[ 0%] Building CXX object CMakeFiles/hyphy.dir/src/core/associative_list.cpp.o
[ 5%] Building CXX object CMakeFiles/hyphy.dir/src/core/avllist.cpp.o
[ 5%] Building CXX object CMakeFiles/hyphy.dir/src/core/avllistx.cpp.o
[ 5%] Building CXX object CMakeFiles/hyphy.dir/src/core/avllistx_iterator.cpp.o
[ 10%] Building CXX object CMakeFiles/hyphy.dir/src/core/avllistxl.cpp.o
[ 10%] Building CXX object CMakeFiles/hyphy.dir/src/core/avllistxl_iterator.cpp.o
[ 10%] Building CXX object CMakeFiles/hyphy.dir/src/core/baseobj.cpp.o
[ 10%] Building CXX object CMakeFiles/hyphy.dir/src/core/batchlan.cpp.o
[ 15%] Building CXX object CMakeFiles/hyphy.dir/src/core/batchlan2.cpp.o
[ 15%] Building CXX object CMakeFiles/hyphy.dir/src/core/batchlanhelpers.cpp.o
[ 15%] Building CXX object CMakeFiles/hyphy.dir/src/core/batchlanruntime.cpp.o
[ 20%] Building CXX object CMakeFiles/hyphy.dir/src/core/calcnode.cpp.o
[ 20%] Building CXX object CMakeFiles/hyphy.dir/src/core/category.cpp.o
[ 20%] Building CXX object CMakeFiles/hyphy.dir/src/core/constant.cpp.o
[ 25%] Building CXX object CMakeFiles/hyphy.dir/src/core/dataset.cpp.o
[ 25%] Building CXX object CMakeFiles/hyphy.dir/src/core/dataset_filter.cpp.o
[ 25%] Building CXX object CMakeFiles/hyphy.dir/src/core/dataset_filter_numeric.cpp.o
[ 25%] Building CXX object CMakeFiles/hyphy.dir/src/core/fisher_exact.cpp.o
[ 30%] Building CXX object CMakeFiles/hyphy.dir/src/core/formula.cpp.o
[ 30%] Building CXX object CMakeFiles/hyphy.dir/src/core/formula_parsing_context.cpp.o
[ 30%] Building CXX object CMakeFiles/hyphy.dir/src/core/fstring.cpp.o
[ 35%] Building CXX object CMakeFiles/hyphy.dir/src/core/global_object_lists.cpp.o
[ 35%] Building CXX object CMakeFiles/hyphy.dir/src/core/global_things.cpp.o
[ 35%] Building CXX object CMakeFiles/hyphy.dir/src/core/hbl_env.cpp.o
[ 40%] Building CXX object CMakeFiles/hyphy.dir/src/core/likefunc.cpp.o
[ 40%] Building CXX object CMakeFiles/hyphy.dir/src/core/likefunc2.cpp.o
[ 40%] Building CXX object CMakeFiles/hyphy.dir/src/core/likefuncocl.cpp.o
[ 40%] Building CXX object CMakeFiles/hyphy.dir/src/core/list.cpp.o
[ 45%] Building CXX object CMakeFiles/hyphy.dir/src/core/mathobj.cpp.o
[ 45%] Building CXX object CMakeFiles/hyphy.dir/src/core/matrix.cpp.o
[ 45%] Building CXX object CMakeFiles/hyphy.dir/src/core/matrix_mult.cpp.o
[ 50%] Building CXX object CMakeFiles/hyphy.dir/src/core/nexus.cpp.o
[ 50%] Building CXX object CMakeFiles/hyphy.dir/src/core/ntuplestorage.cpp.o
[ 50%] Building CXX object CMakeFiles/hyphy.dir/src/core/operation.cpp.o
[ 55%] Building CXX object CMakeFiles/hyphy.dir/src/core/parser.cpp.o
[ 55%] Building CXX object CMakeFiles/hyphy.dir/src/core/parser2.cpp.o
[ 55%] Building CXX object CMakeFiles/hyphy.dir/src/core/polynoml.cpp.o
[ 55%] Building CXX object CMakeFiles/hyphy.dir/src/core/simplelist.cpp.o
[ 60%] Building CXX object CMakeFiles/hyphy.dir/src/core/site.cpp.o
[ 60%] Building CXX object CMakeFiles/hyphy.dir/src/core/stack.cpp.o
[ 60%] Building CXX object CMakeFiles/hyphy.dir/src/core/string_buffer.cpp.o
[ 65%] Building CXX object CMakeFiles/hyphy.dir/src/core/string_file_wrapper.cpp.o
[ 65%] Building CXX object CMakeFiles/hyphy.dir/src/core/strings.cpp.o
[ 65%] Building CXX object CMakeFiles/hyphy.dir/src/core/time_difference.cpp.o
[ 70%] Building CXX object CMakeFiles/hyphy.dir/src/core/topology.cpp.o
[ 70%] Building CXX object CMakeFiles/hyphy.dir/src/core/translation_table.cpp.o
[ 70%] Building CXX object CMakeFiles/hyphy.dir/src/core/tree.cpp.o
[ 70%] Building CXX object CMakeFiles/hyphy.dir/src/core/tree_evaluator.cpp.o
[ 75%] Building CXX object CMakeFiles/hyphy.dir/src/core/tree_iterator.cpp.o
[ 75%] Building CXX object CMakeFiles/hyphy.dir/src/core/trie.cpp.o
[ 75%] Building CXX object CMakeFiles/hyphy.dir/src/core/trie_iterator.cpp.o
[ 80%] Building CXX object CMakeFiles/hyphy.dir/src/core/variable.cpp.o
[ 80%] Building CXX object CMakeFiles/hyphy.dir/src/core/variablecontainer.cpp.o
[ 80%] Building CXX object CMakeFiles/hyphy.dir/src/core/vector.cpp.o
[ 85%] Building CXX object CMakeFiles/hyphy.dir/src/new/bayesgraph.cpp.o
[ 85%] Building CXX object CMakeFiles/hyphy.dir/src/new/bayesgraph2.cpp.o
[ 85%] Building CXX object CMakeFiles/hyphy.dir/src/new/bgm.cpp.o
[ 85%] Building CXX object CMakeFiles/hyphy.dir/src/new/bgm2.cpp.o
[ 90%] Building CXX object CMakeFiles/hyphy.dir/src/new/scfg.cpp.o
[ 90%] Building C object CMakeFiles/hyphy.dir/contrib/SQLite-3.8.2/sqlite3.c.o
cc1: warning: command line option ‘-Weffc++’ is valid for C++/ObjC++ but not for C
[ 90%] Building CXX object CMakeFiles/hyphy.dir/src/utils/hyphyunixutils.cpp.o
[ 95%] Building CXX object CMakeFiles/hyphy.dir/src/contrib/mersenne_twister.cpp.o
[ 95%] Building CXX object CMakeFiles/hyphy.dir/src/contrib/regex.cpp.o
[ 95%] Building CXX object CMakeFiles/hyphy.dir/src/mains/unix.cpp.o
[100%] Linking CXX executable hyphy
mklink HYPHYMP -> hyphy
[100%] Built target hyphy

#testing again
make test
Running tests...
Test project /home/bbitarello/hyphy
Start 1: UNIT-TESTS
1/20 Test #1: UNIT-TESTS ....................... Passed 2.36 sec
Start 2: CODON
2/20 Test #2: CODON ............................ Passed 2.63 sec
Start 3: PROTEIN
3/20 Test #3: PROTEIN .......................... Passed 14.29 sec
Start 4: MTCODON
4/20 Test #4: MTCODON .......................... Passed 65.88 sec
Start 5: ALGAE
5/20 Test #5: ALGAE ............................ Passed 17.45 sec
Start 6: CILIATES
6/20 Test #6: CILIATES ......................... Passed 44.33 sec
Start 7: SLAC
7/20 Test #7: SLAC ............................. Passed 5.19 sec
Start 8: SLAC-PARTITIONED
8/20 Test #8: SLAC-PARTITIONED ................. Passed 17.40 sec
Start 9: FEL
9/20 Test #9: FEL ..............................***Failed 64.80 sec
Start 10: MEME
10/20 Test #10: MEME ............................. Passed 67.39 sec
Start 11: MEME-PARTITIONED
11/20 Test #11: MEME-PARTITIONED ................. Passed 54.17 sec
Start 12: BUSTED
12/20 Test #12: BUSTED ........................... Passed 22.66 sec
Start 13: BUSTED-SRV
13/20 Test #13: BUSTED-SRV ....................... Passed 22.80 sec
Start 14: RELAX
14/20 Test #14: RELAX ............................ Passed 63.38 sec
Start 15: FUBAR
15/20 Test #15: FUBAR ............................ Passed 5.96 sec
Start 16: BGM
16/20 Test #16: BGM .............................. Passed 3.62 sec
Start 17: CONTRAST-FEL
17/20 Test #17: CONTRAST-FEL ..................... Passed 55.03 sec
Start 18: GARD
18/20 Test #18: GARD ............................. Passed 6.67 sec
Start 19: FADE
19/20 Test #19: FADE ............................. Passed 28.94 sec
Start 20: ABSREL
20/20 Test #20: ABSREL ........................... Passed 40.92 sec

95% tests passed, 1 tests failed out of 20

Total Test time (real) = 605.91 sec

The following tests FAILED:
9 - FEL (Failed)
Errors while running CTest
make: *** [Makefile:130: test] Error 8

#once again I went to the file ~/hyphy/Testing/temporary/LastTest.log and found:

** Found 13 sites under pervasive positive diversifying and 28 sites under negative selection at p <= 0.1**

Error:
Failed to correctly classify site 144 in call to assert(0, "Failed to correctly classify site "+(site+1));

Function call stack
1 : assert(0, "Failed to correctly classify site "+(site+1));`

@spond
Copy link
Member

spond commented Jun 30, 2023

Dear @bbitarello,

You can ignore this error; it's a stochastic multi-threading issue which will be fixed in the next release.

That said, for your application, it is better to use the MPI version of HyPhy (where you can spawn different processes and handle individual sites in parallel). You can make it with make MPI, assuming your system has openmpi . Then you can use mpirun -np X HYPHYMPI fel --alignment ...

This is because for FEL, multi-threading does not help nearly as much as farming out individual sites to separate processes.

Finally, --CPU will have no effect on the runtime. The syntax you should use is hyphy CPU=N [other arguments].

For example, using --resample 100...

Multi-threaded run

time hyphy fel --alignment tests/data/bglobin.nex  
...
... 541% cpu 15:19.31 total

Single-threaded run

time hyphy CPU=1 fel --alignment tests/data/bglobin.nex  
...
... 99% cpu 58:30.44 total

MPI run

time mpirun -np 8 HYPHYMPI fel --alignment tests/data/bglobin.nex --resample 100 

...
...  683% cpu 9:50.77 total

Best,
Sergei

@bbitarello
Copy link
Author

Ok, thank you. This works now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants