-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got completion with error #11
Comments
Hi, |
Thanks for your quick reply, we now can run Sherman successfully! |
Can you provide a screenshot of the entire test? |
I cannot see the complete output of server 0 (right part of screenshot ) |
Line 74 in 76e208b
The above line triggers the error |
Is it OK when the number of threads is 2? |
How about a single thread in each machine? Please check RDMA network state via running |
Running single-thread benchmarks sometime is OK and occasionally produces the same error. ibv_write_bw works fine and our own programs also work. |
This issue is weird because we successfully ran the multithread benchmark on two machines once, but currently it doesn't work, Maybe it is due to some machine state issue? |
Can you insert Line 258 in 76e208b
? Let's check if these two servers can init the tree successfully |
Sorry for my so late reply, I'm currently busy on another project. |
Hi, can you send your WeChat ID via |
Thank you so much for your help and I've sent my ID to you. |
Hi, thanks for your open source repo of Sherman, we are happy that we can run Sherman on our cluster to learn more about this system.
The issue
We encountered protection error and deadlock running multithread and multi-machine benchmarks.
Instructions executed
We use the following instructions on each machine to run multithread and multi-machine benchmarks, which produce runtime errors. The Memcached server is on a third machine.
We run the following instructions to run the single-thread and single-machine benchmark, which runs well
The total number of huge pages in the
hugepage.sh
is modified to 4096 to reduce prepare time and huge page size is 2MiB.Error messages
We were able to run a single-thread benchmark on a single machine, but we encountered the following errors when running multithread and multi-machine tests.
Machine configuration
As shown above, RDMA poll failed due to protection error, and deadlock was detected. We are not sure whether this is caused by the wrong hardware configuration or software bugs. The machine configuration is as follows:
The hardware configuration seems to meet the requirement of Sherman (OFED version and firmware version).
Analysis
The protection error is caused by access to invalid memory regions, but we are not sure whether this is caused by software bugs or the wrong hardware setup. The deadlock error is also confusing because the benchmarks are read-only. Can you give us some tips to debug these errors?
The text was updated successfully, but these errors were encountered: