-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zoltan Test Failures on Knights Landing with OpenMPI 1.10.4 and Intel 17.0.098 #600
Comments
Indeed, this test graph has only four vertices, so we shouldn't exhaust memory while reading the graph. Unfortunately, I am unable to reproduce this problem in my environment. I don't have access to Intel 17.0. The problem builds/runs fine on my workstations with intel 16.0 as well as with clang and gcc. @nmhamster Do you see this problem with other versions of the Intel compiler? May I access your build to do some debugging (and, if yes, then how?)? |
@kddevin do you have access to the Bowman test machine? |
@nmhamster I just requested it...stay tuned. |
I do not. Sent from my iPhone On Sep 8, 2016, at 3:58 PM, Si Hammond <notifications@github.commailto:notifications@github.com> wrote: @kddevinhttps://github.com/kddevin do you have access to the Bowman test machine? You are receiving this because you were assigned. |
@vjleung No worries, Vitus; I will handle this issue with Si. No need to request a new account. |
@kddevin Okay, thanks. Sent from my iPhone On Sep 8, 2016, at 4:11 PM, K Devine <notifications@github.commailto:notifications@github.com> wrote: @vjleunghttps://github.com/vjleung No worries, Vitus; I will handle this issue with Si. No need to request a new account. You are receiving this because you were mentioned. |
@kddevin @nmhamster #include <iostream>
int main(){
int el = 0;
for (int i = 0; i < 5; ++i){
int k = i;
for (;k; ++el){
k &= k-1;
}
}
std::cout << el << std::endl;
el = 0;
for (int i = 0; i < 5; ++i){
int el2 = 0;
int k = i;
for (;k; ++el2){
k &= k-1;
}
el += el2;
}
std::cout << el << std::endl;
} |
Similar to -O0 flag, it seems that changing the loop below around line ~400 for (i = 0; i < nsend; i++) {
v = vtx_list[i];
nvtx_edges += old_xadj[v+1] - old_xadj[v];
} to for (i = 0; i < nsend; i++) {
v = vtx_list[i];
volatile int tmp = old_xadj[v+1] - old_xadj[v];
nvtx_edges += tmp;
} is fixing the problem. But I don't see why would volatile be necessary. |
As a sanity test, I ran these tests through purify (with gcc 4.7.2); no memory misbehavior was reported. |
@nmhamster |
Hi, @nmhamster . Can we close this issue? Or does it persist? Thanks. |
@kddevin - I think we can close it. If I find the problem again, I will reopen. Thank you! |
Thanks, @nmhamster |
Origin repo remote tracking branch: 'github/master' Origin repo remote repo URL: 'github = git@github.com:TriBITSPub/TriBITS.git' Git describe: tribits_start-3432-g7b14c49e At commit: commit b9b5bfc7065520cc1e74382c1c8ca828136866fd Author: Anderson Chauphan <achauph@sandia.gov> Date: Mon Feb 12 20:28:22 2024 -0700 Summary: Removed a debug print statement accidentally committed previously (#600)
Zoltan team I am seeing some issues with the latest builds on Knights Landing and Intel 17.0.098 compilers. I am seeing an insufficient memory error on the failing cases. The node has 16GB + 96GB of memory so I think this should be sufficient?
The text was updated successfully, but these errors were encountered: