Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs stuck in Eqw or hqw states #1

Closed
Amy-T opened this issue Apr 26, 2017 · 2 comments
Closed

Jobs stuck in Eqw or hqw states #1

Amy-T opened this issue Apr 26, 2017 · 2 comments

Comments

@Amy-T
Copy link

Amy-T commented Apr 26, 2017

Hi Dear Won Chol,

Thank you for developing DCBLAST.

Currently, we are trying to use DCBLAST to perform blastp on our Sun Grid Engine System.

The command we were using is:
perl dcblast.pl --ini configL.ini --input L6_Fv.faa --size 200 --output L6_Fv_vs_nr2 --blast blastp

We noticed that the .faa file was split successfully. However, the jobs were in Eqw or hqw states. Upon checking the error underlying Eqw state, we noticed that it is related to password entry error for the user ('can't get password entry for user "xxx". Either the user does not exist or NIS error!'); the user account is available on head node but not on the compute node.

Hope you may be able to advise on how we can resolve the problem.

Thanks.

@wyim-pgl
Copy link
Owner

wyim-pgl commented Apr 26, 2017

It looks like your computing nodes cannot access password system.

Usually, you don't need to do anything special authenticate for SGE job submission, because if you already login to login node, SGE collect the your information from login node.

However, if the compute nodes are also under kerberos authenticate, then you will need to
enable security when you compile SGE, that is, add the -kerberos flag (and make sure you have the kerberos development libraries on the machine where you compile the source, etc)

Also these errors appear either consistently or inconsistently and are no longer correlated with the use of OpenDirectory (problems appear with local user accounts as well), NFS, case-insensitive filesystems or any other system settings.

More information and background on this issue can be found via these links

https://linuxfollies.blogspot.com/2014/05/sssd-setup-in-bright-cluster.html

Can you do qrsh or qsub for your job?

simple tests are below

qrsh hostname
 qsub -b y -t 1-40 hostname

or

login as 'henry' instead of Henry.

If you have hetero HPC system with OSX and linux. It should be a problem.

If you cannot find anything, I hope you contact to your admin. Because this is not the DCBLAST problem.

@Amy-T
Copy link
Author

Amy-T commented May 4, 2017

Thanks for the suggestions. We will give it a try.

@Amy-T Amy-T closed this as completed May 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants