Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synclist: EXECUTE{ALWAYS} ignore noderange #5755

Closed
kcgthb opened this issue Oct 31, 2018 · 14 comments
Closed

synclist: EXECUTE{ALWAYS} ignore noderange #5755

kcgthb opened this issue Oct 31, 2018 · 14 comments

Comments

@kcgthb
Copy link
Member

kcgthb commented Oct 31, 2018

In synclists, the EXECUTE and EXECUTEALWAYS directive are executed on all nodes, even if a noderange is specified.

Given the following synclist:

/tmp/script -> (nodeA) /tmp/script
EXECUTEALWAYS:
/tmp/script

updatenode -F nodeB will execute /tmp/script on nodeB, even though nodeA is specified in the synclist noderange.

@robin2008 robin2008 added this to the 2.14.5 milestone Nov 1, 2018
@immarvin
Copy link
Contributor

immarvin commented Nov 1, 2018

hi @cxhong , do you have any thoughts on this?

@cxhong
Copy link
Contributor

cxhong commented Nov 1, 2018

can u check if /tmp/script already in the nodeB ? I couldn't recreate this issue.
maybe you can copy/paste some output from updatenode command.

@kcgthb
Copy link
Member Author

kcgthb commented Nov 1, 2018

Sure, here you go:

Example /tmp/script.sh:

#!/bin/bash
echo "/tmp/script running on $(hostname -s)"

Make sure the script doesn't exist on the nodes:

MN # xdsh sh-06-[33-34] rm /tmp/script.sh
[sh-hn03]: sh-06-34: rm: cannot remove '/tmp/script.sh': No such file or directory
[sh-hn03]: sh-06-33: rm: cannot remove '/tmp/script.sh': No such file or directory

Synclist file:

/tmp/script.sh -> (sh-06-34) /tmp/script.sh
EXECUTEALWAYS:
/tmp/script.sh

updatenode -F on sh-06-34 executes the script:

MN # updatenode sh-06-34 -FV
Running command on sh-hn01.SUNet: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1

Running command on sh-hn01.SUNet: chmod -R a+r /install/postscripts 2>&1

Running command on sh-hn01.SUNet: cat /install/postscripts/mypostscript.tmpl | grep ZONENAME 2>&1

  sh-hn01.SUNet: Internal call command: xdcp sh-06-34 --nodestatus -F /install/custom/lists/_common/synclist -T
Running internal xCAT command: xdcp ...

Running command on sh-hn01.SUNet: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1

Running command on sh-hn01.SUNet: rm /tmp/xdcpsynclist.95673 2>&1
TRACE:Default context is XCAT.
TRACE:Fanout Value is 64.
TRACE:Timeout Value is .
 TRACE: Executing Command:/bin/sh -c /tmp/rsync_sh-06-34
TRACE:Default context is XCAT
TRACE:Node RSH is
TRACE: Fanout value is 64.
TRACE: Timeout value is
TRACE: Verify value is
TRACE: Execute option specified.
TRACE:Execute: Exporting File:/usr/bin/scp -B /var/xcat/syncfiles/tmp/script.sh root@sh-06-34:/tmp/0ML6ICNqYU.dsh
Command name: /usr/bin/ssh -o BatchMode=yes -x root@sh-06-34 export NODE=sh-06-34; export LANG=en_US.UTF-8 LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=C PERL_BADLANG=0 ;  /tmp/0ML6ICNqYU.dsh ; export DSH_TARGET_RC=$?; echo ":DSH_TARGET_RC=${DSH_TARGET_RC}:";rm /tmp/0ML6ICNqYU.dsh
sh-06-34: /tmp/script running on sh-06-34
File synchronization has completed for nodes: "sh-hn03,sh-06-34"

On sh-06-33, the script is also executed, although it should not:

# updatenode sh-06-33 -FV
Running command on sh-hn01.SUNet: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1

Running command on sh-hn01.SUNet: chmod -R a+r /install/postscripts 2>&1

Running command on sh-hn01.SUNet: cat /install/postscripts/mypostscript.tmpl | grep ZONENAME 2>&1

  sh-hn01.SUNet: Internal call command: xdcp sh-06-33 --nodestatus -F /install/custom/lists/_common/synclist -T
Running internal xCAT command: xdcp ...

Running command on sh-hn01.SUNet: ip -4 --oneline addr show |awk -F ' ' '{print $4}'|awk -F '/' '{print $1}' 2>&1

Running command on sh-hn01.SUNet: rm /tmp/xdcpsynclist.96127 2>&1
TRACE:Default context is XCAT.
TRACE:Fanout Value is 64.
TRACE:Timeout Value is .
 TRACE: Executing Command:/bin/sh -c /tmp/rsync_sh-06-33
TRACE:Default context is XCAT
TRACE:Node RSH is
TRACE: Fanout value is 64.
TRACE: Timeout value is
TRACE: Verify value is
TRACE: Execute option specified.
TRACE:Execute: Exporting File:/usr/bin/scp -B /var/xcat/syncfiles/tmp/script.sh root@sh-06-33:/tmp/0ML6ICNqYU.dsh
Command name: /usr/bin/ssh -o BatchMode=yes -x root@sh-06-33 export NODE=sh-06-33; export LANG=en_US.UTF-8 LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=C PERL_BADLANG=0 ;  /tmp/0ML6ICNqYU.dsh ; export DSH_TARGET_RC=$?; echo ":DSH_TARGET_RC=${DSH_TARGET_RC}:";rm /tmp/0ML6ICNqYU.dsh
sh-06-33: /tmp/script running on sh-06-33
File synchronization has completed for nodes: "sh-hn03,sh-06-33"

If it makes any difference, this is in hierarchical mode.

@cxhong
Copy link
Contributor

cxhong commented Nov 2, 2018

@kcgthb , Thanks.

This is a bug for syncfile hierarchy support.
updatenode -F will copy the syncfile to the service node /var/log/ syncfiles, then ran following:

/usr/bin/scp -B /var/xcat/syncfiles/tmp/script.sh root@sh-06-33:/tmp/0ML6ICNqYU.dsh
Command name: /usr/bin/ssh -o BatchMode=yes -x root@sh-06-33 export NODE=sh-06-33; export LANG=en_US.UTF-8 LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=C PERL_BADLANG=0 ;  /tmp/0ML6ICNqYU.dsh ; export DSH_TARGET_RC=$?; echo ":DSH_TARGET_RC=${DSH_TARGET_RC}:";rm /tmp/0ML6ICNqYU.dsh

so, it's doesn't matter if this scripts is on the compute node or not.

@cxhong
Copy link
Contributor

cxhong commented Nov 2, 2018

@kcgthb while I tried to fix this, I think this EXECUTEALWAYS maybe works as design. @immarvin , please confirm. in the xCAT readdoc:

The EXECUTEALWAYS: will list all the postscripts that you would like to run after the files are sync’d to the nodes. These scripts will run whether or not any files are sync’d to the nodes

that's why if script is located in the syncfiledir /var/xcat/syncfiles on MN (for hierarchy, on SN), it will always execute on the node. This script doesn't need to be located in the node.

@robin2008
Copy link
Member

@cxhong
Here the issue is that /tmp/script should not be synced to sh-06-33 as we just define it as /tmp/script.sh -> (sh-06-34) /tmp/script.sh

So there is no problem MN to sync all files to SN, but You need to pay attention to the why SN sync the file to sh-06-33 when only sh-06-34 is defined for that file.

@cxhong
Copy link
Contributor

cxhong commented Nov 5, 2018

@robin2008 , I understand what is the issue here. I can recreate it from our hierarchy cluster. from our readdoc, EXECUTE and EXECUTEALWAYS clause is different than -> clause, the script doesn't need to be located in the compute node. All the scripts will copy to SN, then /tmp/script.sh -> (sh-06-34) /tmp/script.shclause will scp script to nodesh-06-34, but EXECUTE and EXECUTEALWAYS just look the sync file on the SN /var/xcat/syncfiles, and use scpto compute node and save as temp file, thenssh` to compute node to execute the file. file will be removed after that.

Can someone confirm and are there FVT test cases for this?

@robin2008
Copy link
Member

we have updatenode_syncfile_EXECUTE, but not cover the case /tmp/script.sh -> (sh-06-34) /tmp/script.sh as we only have one CN in the auto test environment.

Need to update the test case to cover it.

@robin2008
Copy link
Member

robin2008 commented Nov 5, 2018

Per my understanding:
EXECUTE means the scripts will be run when the file synced to the node is updated. But you need to put two lines in synclist.

/tmp/share/file2  ->  (Possible noderange) /tmp/file2
/tmp/share/file2.post -> (Possible noderange) /tmp/file2.post
EXECUTE:
/tmp/share/file2.post

Here, if file2 is updated when sync, file2.post will be executed on nodes.

@kcgthb Do you confirm EXECUTE does not support "noderange" in synclist too?

While for EXECUTEALWAYS, whatever the file is changed or not on the node, the scripted will be executed always. The format should be like this:

/tmp/myscript -> (Possible noderange) /tmp/myscript
EXECUTEALWAYS:
/tmp/myscript

In our document, it does not clearly say it support the noderange for the EXECUTEALWAYS, but I thought it should be implied.

So @cxhong let's take a look to see if we could support this with small effort, as the support is in xdcp.

@kcgthb
Copy link
Member Author

kcgthb commented Nov 6, 2018

@kcgthb Do you confirm EXECUTE does not support "noderange" in synclist too?

@robin2008 Actually, it looks like EXECUTE does the right thing:

Synclist file:

/tmp/script.sh -> (sh-06-34) /tmp/script.sh
EXECUTE:
/tmp/script.sh

Check that script.sh doesn't exist on the nodes:

MN # xdsh sh-06-[33-34] rm /tmp/script.sh
[sh-hn03]: sh-06-34: rm: cannot remove '/tmp/script.sh': No such file or directory
[sh-hn03]: sh-06-33: rm: cannot remove '/tmp/script.sh': No such file or directory
MN #

updatenode -F on sh-06-34 executes the script:

MN # updatenode sh-06-34 -F
sh-06-34: /tmp/script running on sh-06-34
File synchronization has completed for nodes: "sh-hn03,sh-06-34"
MN #

When run a 2nd time, it doesn't execute it, since the file is already there:

MN # updatenode sh-06-34 -F
File synchronization has completed for nodes: "sh-hn03,sh-06-34"
MN #

And updatenode -F on sh-06-33 does not copy nor execute the script:

MN # updatenode sh-06-33 -F
File synchronization has completed for nodes: "sh-hn03,sh-06-33"
MN # ssh sh-06-33 ls /tmp/script.sh
ls: cannot access /tmp/script.sh: No such file or directory
MN #

So I guess I will move to EXECUTE for now, because it better fits my needs, but EXECUTEALWAYS should still be fixed to only copy files on the nodes listed in the range.

@robin2008
Copy link
Member

Yes, I think so.
EXECUTEALWAYS should only control the behavior of executing the script, but not for the syncing.

@immarvin
Copy link
Contributor

hi @kcgthb , the issues has been fixed in #5834

@kcgthb
Copy link
Member Author

kcgthb commented Nov 27, 2018

Awesome, thank you!

@cxhong
Copy link
Contributor

cxhong commented Dec 3, 2018

please reopen if there are issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants