Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAR-154: Multipath path count misbehaves in port failure/restore in scalability test #336

Closed
wants to merge 1 commit into from

Conversation

swathicitrix
Copy link

Signed-off-by: Swathi Kovvuri swathi.kovvuri@citrix.com

@coveralls
Copy link

Coverage Status

Coverage remained the same at 28.682% when pulling 34029a0 on swathicitrix:PAR-154 into aa449fb on xapi-project:master.

Copy link
Contributor

@germanop germanop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not done, yet.

I am wondering, in general, if the udev script, for efficiency, cannot just be a C program.

@@ -42,6 +42,8 @@ SM_LIBS += mpath_cli
SM_LIBS += mpathutil
SM_LIBS += LUNperVDI
SM_LIBS += mpathcount
SM_LIBS += mpathcountdaemon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mpathcountd is sufficient (no one calls daemons "daemon")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -153,6 +155,8 @@ install: precheck
$(SM_STAGING)/$(MODPROBE_DIR)
install -m 644 etc/logrotate.d/$(SMLOG_CONF) \
$(SM_STAGING)/$(LOGROTATE_DIR)
install -m 644 drivers/mpathcountdaemon.service \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -42,6 +42,7 @@
DEVBYMPPPATH = "/dev/disk/by-mpp"
SYSFS_PATH='/sys/class/scsi_host'
UMPD_PATH='/var/run/updatempppathd.py.pid'
MPATH_COUNT='/var/run/mpathcountdaemon.py.pid'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the daemon is written in a simple way, usually there is no need to track the pid file, systemd takes care of it.
It is surely not needed for the next block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

@@ -245,6 +246,10 @@ def activate():
if not _is_mpp_daemon_running():
cmd = ["service", "updatempppathd", "start"]
util.pread2(cmd)
if not os.path.exists(MPATH_COUNT):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it indented this way? This block is executed only if the previous is is True.
They are independent

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

@@ -245,6 +246,10 @@ def activate():
if not _is_mpp_daemon_running():
cmd = ["service", "updatempppathd", "start"]
util.pread2(cmd)
if not os.path.exists(MPATH_COUNT):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should always assume that a daemon is running. The previous block of code is a left over of a practice I don't like match nad it is not even immune from race conditions and uncleaned pid files.
I'd rather do one of the following:

  1. just assume the service is running and raise an exception if it is not
  2. assume it is running and if not, start it in the exception handler (using systemd tools)
  3. check if the daemon is running using the right tools (in this case, systemd tools)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified the code as per your comments.

@@ -36,10 +36,10 @@

util.daemon()
if len(sys.argv) == 3:
match_bySCSIid = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be left untouched. The only net result is to have a wrong indentation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -373,7 +373,7 @@ def attach(self, sr_uuid, vdi_uuid):
# The SCSIid is already stored inside SR sm_config.
# We need only to trigger mpathcount
try:
cmd = ['/opt/xensource/sm/mpathcount.py', scsi_id]
cmd = ['python','/opt/xensource/sm/udev_mpath.py',scsi_id]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's executable, no need to call it through "python"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -486,7 +486,7 @@ def _setMultipathableFlag(self, SCSIid=''):
self.session.xenapi.SR.set_sm_config(self.sr_ref, sm_config)

if self.mpath == "true" and len(SCSIid):
cmd = ['/opt/xensource/sm/mpathcount.py',SCSIid]
cmd = ['python','/opt/xensource/sm/udev_mpath.py',SCSIid]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -7,5 +7,5 @@ ACTION=="change", PROGRAM!="/sbin/dmsetup info -c --noheadings -j %M -m %m", GOT
PROGRAM!="/sbin/dmsetup info -c -o uuid,name --separator ' ' --noheadings -j %M -m %m", GOTO="end_mpath"
RESULT!="mpath-*", GOTO="end_mpath"
# ENV is necessary otherwise the child process of mpathcount cannot find multipathd executable
ACTION=="change", ENV{PATH}="/sbin:/bin:/usr/sbin:/usr/bin", RUN+="/opt/xensource/sm/mpathcount.py %c{2}"
ACTION=="change", ENV{SCSIID}="%c{2}", ENV{PATH}="/sbin:/bin:/usr/sbin:/usr/bin", RUN+="/opt/xensource/sm/udev_mpath.py $env{SCSIID}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is conflicting with Zheng's patch that is already accepted, even though I did not push it yet here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now in my workspace, on the master branch, Zheng changes are not available. I actually took the change done
by ZhengLi from Patch Queue and modified the file name.

Copy link
Author

@swathicitrix swathicitrix Oct 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now in my workspace, on the master branch, Zheng changes are not available. I actually took the change done by ZhengLi from Patch Queue and modified the file name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed all the outstanding patches, so Zheg's should be there.

import sys
import os

server_address = "/uds_socket"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Root directory is not the right place for a socket

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified and now place it in/var/run/uds_socket

@coveralls
Copy link

Coverage Status

Coverage remained the same at 28.682% when pulling c0f7d17 on swathicitrix:PAR-154 into aa449fb on xapi-project:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 28.682% when pulling a644435 on swathicitrix:PAR-154 into aa449fb on xapi-project:master.

@germanop
Copy link
Contributor

This patch need rebase. It is against an old version of sm

@germanop
Copy link
Contributor

I will leave specific errors aside for this time because we spent much time on those and I lost the overall picture (but just give a look at udev_mpath, there are a few in the first lines).

This implementation does not scale: the 2 most notable problems are:

  • it does use an oriented socket (stream instead of dgram) that is not necessary
  • the daemon has only one thread for listening and serving the requests

If you test it quickly you will see that it cannot sustain a heavy load.
In my test:

  • I used a custom program (instead of mpathcount.py) that just print something and sleep for 20 seconds
  • started mpathcountd manually
  • run from a shell udev_mpath (make sure to pass something because your implementation crashed otherwise) repeatedly.

Well, you will see that after the 4th the next invocation of udev_mpath will get stuck (so, in real life, udev will get stuck and we don't want that).
This is because the daemon is listening (not needed if dgram but this is not the main problem), accepting requests and invoking the worker program in the same thread.
The first few invocations of udev_mpath return immediately just because the system is queuing your requests (even though you passed "1" to listen()) but the daemon is not processing them.
In fact, it is still busy waiting for the first mpathcount to return.
When the fifth arrives, the daemon is not ready, the OS is not queuing any more for you and your call get stuck.

The daemon implementation had to be designed more carefully.

@swathicitrix
Copy link
Author

Germano,
Thanks for your valuable inputs here.
Initially, I was thinking of the limit for requests that can be queued through the socket IPC.
But somehow I missed in analyzing it further.

Below is a draft of my next approach. Let me know your views on this.

Step1: As of now socket calls used are blocking calls. So need to make it non-blocking.
Step2: Stop calling mpathcount.py from mpathcountd, instead store this in any suitable data structure.
Step3: Create another daemon thread which can access the info stored by mpathcountd in specified time intervals and call mpathcount.py accordingly.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 28.947% when pulling 799cb83 on swathicitrix:PAR-154 into b4aa9df on xapi-project:master.

Copy link
Contributor

@stefanopanella stefanopanella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Code change is fine but the title and the content of the commit message should be changed to reflect what this fix is doing.

It would be good to explain that with this change

  • mpathcount.py is going back operating without a specific SCSIid
  • the udev rule is scheduling mpathcount to run on a different context (so it can not be killed by udev)
  • the 2 seconds sleep are a mitigation for storm of events

@coveralls
Copy link

Coverage Status

Coverage remained the same at 28.947% when pulling 0eb9a51 on swathicitrix:PAR-154 into b4aa9df on xapi-project:master.

1. Reverted the changes in mpathcount.py to operate with specific iscsiid.
2. Triggering mpathcount.py from udev rules in async mode.
3. Added sleep in mpathcount.py to be in par with multipathd.

Signed-off-by: Swathi Kovvuri <swathi.kovvuri@citrix.com>
Reviewed-by: Stefano Panella <stefano.panella@citrix.com>
@stefanopanella
Copy link
Contributor

stefanopanella commented Dec 6, 2016

It all looks good to me now.
Reviewed-by: Stefano Panella <stefano.panella@citrix.com>

@germanop
Copy link
Contributor

germanop commented Dec 6, 2016

I will fix this line in the commit message when pushing if not fixed earlier

Reverted the changes in mpathcount.py to operate with specific *iscsiid*

scsid instead of iscsiid

@coveralls
Copy link

Coverage Status

Coverage remained the same at 28.947% when pulling d425daa on swathicitrix:PAR-154 into b4aa9df on xapi-project:master.

@germanop germanop closed this Dec 22, 2016
@germanop
Copy link
Contributor

Superseded by #344

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants