Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Issues on reproducing your artifact #1
I will list here a brief review of this artifact, so you could fix it in the future.
A cluster of 8 nodes x (2 CPUs Intel Xeon L5420, 4 cores/CPU, 16GB RAM, 298GB HDD) is deployed
1 - It is not sufficiently explained how to run this modified benchmark (command does not exist), (this also concerns the parent banchmark but I found how to run HPL on the internet)
2- There is a dependency mismatch inside the added scripts and nothing is said about that in the tutorial and the paper.
So, the error is about:
Hope yopu fix your artifact and its description ASAP, so it can be reproducible.
Thank you for your efforts and sorry for the unclear description.
In Line-41 of hpl-daemon.sh there is the submission command
If you use mpirun of Intel MPI, change it to this command. (Leave the heading environment variables unchanged).
To run SKT-HPL, run
For HPL, I actually run a similar command of this one:
and it runs as you can see in the attached figure. But the problem is appeared when trying to run SKT-HPL, I submit hpl-daemon.sh :
but an error was raised. The output is (./hpl-daemon.sh: line 41: srun: command not found)
So, could you please explain in your tutorial what is srun and how we can get it if we are not familiar with. Its very helpful to explain this basic things, specifically when you know nothing about the expected users and what is already installed on his/her cluster.
Thank you very much in advance.
Your method to run HPL is right. However, we have provided a wrapper script for SKT-HPL so the command is different.
DO NOT use
The error is raised because Line-41 of hpl-daemon.sh invokes
Change the line from
Also, Line-50 of hpl-daemon.sh uses srun again and should be changed to use mpirun.
GOOD NEWS: We have prepared a 'ready to run' experimental platform for reviewers to evaluate SKT-HPL. So if you find it too difficult to run SKT-HPL on your our machine, try our experiment platform. Please see 'Run_SKT-HPL_on_gorgon_cluster.pdf' on our GitHub repo.
Thank you very much for updating the tutorial and offering for us to access your own cluster. Actually, I have entered your cluster not to evaluate the artifact as I want to run it on a different hardware and environment. So, I accessed your cluster to copy the file HPL.dat to the cluster that I'm working on.
I used also mpirun instead of srun, and everything is goning to be OK. The evaluation is running now, I can successfully inject a failure and SKT can resume the work. For other reviewers, the following screenshot shows well that several snapshots are taken, also that SKT is trying to restart after injecting a one node failure.
I'm just still waiting for the final result (as the evaluation is running now) to know if the test will pass or not.