-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PAR-154: Multipath path count misbehaves in port failure/restore in scalability test #336
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not done, yet.
I am wondering, in general, if the udev script, for efficiency, cannot just be a C program.
@@ -42,6 +42,8 @@ SM_LIBS += mpath_cli | |||
SM_LIBS += mpathutil | |||
SM_LIBS += LUNperVDI | |||
SM_LIBS += mpathcount | |||
SM_LIBS += mpathcountdaemon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mpathcountd is sufficient (no one calls daemons "daemon")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -153,6 +155,8 @@ install: precheck | |||
$(SM_STAGING)/$(MODPROBE_DIR) | |||
install -m 644 etc/logrotate.d/$(SMLOG_CONF) \ | |||
$(SM_STAGING)/$(LOGROTATE_DIR) | |||
install -m 644 drivers/mpathcountdaemon.service \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -42,6 +42,7 @@ | |||
DEVBYMPPPATH = "/dev/disk/by-mpp" | |||
SYSFS_PATH='/sys/class/scsi_host' | |||
UMPD_PATH='/var/run/updatempppathd.py.pid' | |||
MPATH_COUNT='/var/run/mpathcountdaemon.py.pid' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the daemon is written in a simple way, usually there is no need to track the pid file, systemd takes care of it.
It is surely not needed for the next block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
@@ -245,6 +246,10 @@ def activate(): | |||
if not _is_mpp_daemon_running(): | |||
cmd = ["service", "updatempppathd", "start"] | |||
util.pread2(cmd) | |||
if not os.path.exists(MPATH_COUNT): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it indented this way? This block is executed only if the previous is is True.
They are independent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
@@ -245,6 +246,10 @@ def activate(): | |||
if not _is_mpp_daemon_running(): | |||
cmd = ["service", "updatempppathd", "start"] | |||
util.pread2(cmd) | |||
if not os.path.exists(MPATH_COUNT): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, we should always assume that a daemon is running. The previous block of code is a left over of a practice I don't like match nad it is not even immune from race conditions and uncleaned pid files.
I'd rather do one of the following:
- just assume the service is running and raise an exception if it is not
- assume it is running and if not, start it in the exception handler (using systemd tools)
- check if the daemon is running using the right tools (in this case, systemd tools)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the code as per your comments.
@@ -36,10 +36,10 @@ | |||
|
|||
util.daemon() | |||
if len(sys.argv) == 3: | |||
match_bySCSIid = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be left untouched. The only net result is to have a wrong indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -373,7 +373,7 @@ def attach(self, sr_uuid, vdi_uuid): | |||
# The SCSIid is already stored inside SR sm_config. | |||
# We need only to trigger mpathcount | |||
try: | |||
cmd = ['/opt/xensource/sm/mpathcount.py', scsi_id] | |||
cmd = ['python','/opt/xensource/sm/udev_mpath.py',scsi_id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's executable, no need to call it through "python"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -486,7 +486,7 @@ def _setMultipathableFlag(self, SCSIid=''): | |||
self.session.xenapi.SR.set_sm_config(self.sr_ref, sm_config) | |||
|
|||
if self.mpath == "true" and len(SCSIid): | |||
cmd = ['/opt/xensource/sm/mpathcount.py',SCSIid] | |||
cmd = ['python','/opt/xensource/sm/udev_mpath.py',SCSIid] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -7,5 +7,5 @@ ACTION=="change", PROGRAM!="/sbin/dmsetup info -c --noheadings -j %M -m %m", GOT | |||
PROGRAM!="/sbin/dmsetup info -c -o uuid,name --separator ' ' --noheadings -j %M -m %m", GOTO="end_mpath" | |||
RESULT!="mpath-*", GOTO="end_mpath" | |||
# ENV is necessary otherwise the child process of mpathcount cannot find multipathd executable | |||
ACTION=="change", ENV{PATH}="/sbin:/bin:/usr/sbin:/usr/bin", RUN+="/opt/xensource/sm/mpathcount.py %c{2}" | |||
ACTION=="change", ENV{SCSIID}="%c{2}", ENV{PATH}="/sbin:/bin:/usr/sbin:/usr/bin", RUN+="/opt/xensource/sm/udev_mpath.py $env{SCSIID}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is conflicting with Zheng's patch that is already accepted, even though I did not push it yet here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now in my workspace, on the master branch, Zheng changes are not available. I actually took the change done
by ZhengLi from Patch Queue and modified the file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of now in my workspace, on the master branch, Zheng changes are not available. I actually took the change done by ZhengLi from Patch Queue and modified the file name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed all the outstanding patches, so Zheg's should be there.
import sys | ||
import os | ||
|
||
server_address = "/uds_socket" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Root directory is not the right place for a socket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified and now place it in/var/run/uds_socket
This patch need rebase. It is against an old version of sm |
I will leave specific errors aside for this time because we spent much time on those and I lost the overall picture (but just give a look at udev_mpath, there are a few in the first lines). This implementation does not scale: the 2 most notable problems are:
If you test it quickly you will see that it cannot sustain a heavy load.
Well, you will see that after the 4th the next invocation of udev_mpath will get stuck (so, in real life, udev will get stuck and we don't want that). The daemon implementation had to be designed more carefully. |
Germano, Below is a draft of my next approach. Let me know your views on this. Step1: As of now socket calls used are blocking calls. So need to make it non-blocking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Code change is fine but the title and the content of the commit message should be changed to reflect what this fix is doing.
It would be good to explain that with this change
- mpathcount.py is going back operating without a specific SCSIid
- the udev rule is scheduling mpathcount to run on a different context (so it can not be killed by udev)
- the 2 seconds sleep are a mitigation for storm of events
1. Reverted the changes in mpathcount.py to operate with specific iscsiid. 2. Triggering mpathcount.py from udev rules in async mode. 3. Added sleep in mpathcount.py to be in par with multipathd. Signed-off-by: Swathi Kovvuri <swathi.kovvuri@citrix.com> Reviewed-by: Stefano Panella <stefano.panella@citrix.com>
It all looks good to me now. |
I will fix this line in the commit message when pushing if not fixed earlier
scsid instead of iscsiid |
Superseded by #344 |
Signed-off-by: Swathi Kovvuri swathi.kovvuri@citrix.com