New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete LUN on detach of RawISCSI #355

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
4 participants
@cardoe

cardoe commented Dec 6, 2017

iscsi: Delete LUN on detach of RawISCSI

We need to evict stale cached entry for LUN that's undergoing migration.

Signed-off-by: Rick Mesta rick.mesta@rackspace.com
Signed-off-by: Doug Goldstein doug.goldstein@rackspace.com
Reviewed-by: Doug Goldstein doug.goldstein@rackspace.com

@cardoe

This comment has been minimized.

Show comment
Hide comment
@cardoe

cardoe Dec 6, 2017

We've seen an issue where a volume gets detached via nova and migrated to a new storage backend and then re-attached to the same guest. Because of this code that this PR removes it caches the old info and results in the re-attached volume not working. This causes it to rescan every time there is an attach and not use any cached data.

cardoe commented Dec 6, 2017

We've seen an issue where a volume gets detached via nova and migrated to a new storage backend and then re-attached to the same guest. Because of this code that this PR removes it caches the old info and results in the re-attached volume not working. This causes it to rescan every time there is an attach and not use any cached data.

@cardoe

This comment has been minimized.

Show comment
Hide comment
@cardoe

cardoe Dec 6, 2017

There are concerns about the side effects of this with large installations with a large amount of volumes so if you have any advice on how to best proceed with fixing the issue while retaining the performance that would be most appreciated.

cardoe commented Dec 6, 2017

There are concerns about the side effects of this with large installations with a large amount of volumes so if you have any advice on how to best proceed with fixing the issue while retaining the performance that would be most appreciated.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Dec 6, 2017

Coverage Status

Coverage decreased (-0.01%) to 33.745% when pulling 9b51ed7 on cardoe:fix-iscsi-rescan-when-attaching into 8274357 on xapi-project:master.

coveralls commented Dec 6, 2017

Coverage Status

Coverage decreased (-0.01%) to 33.745% when pulling 9b51ed7 on cardoe:fix-iscsi-rescan-when-attaching into 8274357 on xapi-project:master.

@cardoe

This comment has been minimized.

Show comment
Hide comment
@cardoe

cardoe Dec 6, 2017

So maybe a better a approach here would be to some how clear the item out of the cache on a detach?

cardoe commented Dec 6, 2017

So maybe a better a approach here would be to some how clear the item out of the cache on a detach?

@BobBall

This comment has been minimized.

Show comment
Hide comment
@BobBall

BobBall Dec 6, 2017

Contributor

This affects the Base SR type which includes all types using iSCSI. Is there a fix that could be made in the RawISCSI sr type?

Contributor

BobBall commented Dec 6, 2017

This affects the Base SR type which includes all types using iSCSI. Is there a fix that could be made in the RawISCSI sr type?

@BobBall

This comment has been minimized.

Show comment
Hide comment
@BobBall

BobBall Dec 14, 2017

Contributor

This appears to be a special case for the RawISCSI SR when a volume is migrated from one host to another. With LVMoISCSI we require the lun to be plugged to all hosts all the time so it doesn't migrated around in this way necessitating running discovery every time we attach, running only if we don't already have a record for the IQN should be sufficient.

As such can we isolate this change to the RawISCSI SR type by having a new method on the BaseISCSI class to discover the map, with the existing behaviour, and override the behaviour in the RawISCSI class?

Contributor

BobBall commented Dec 14, 2017

This appears to be a special case for the RawISCSI SR when a volume is migrated from one host to another. With LVMoISCSI we require the lun to be plugged to all hosts all the time so it doesn't migrated around in this way necessitating running discovery every time we attach, running only if we don't already have a record for the IQN should be sufficient.

As such can we isolate this change to the RawISCSI SR type by having a new method on the BaseISCSI class to discover the map, with the existing behaviour, and override the behaviour in the RawISCSI class?

iscsi: Delete LUN on detach of RawISCSI
We need to evict stale cached entry for LUN that's undergoing migration.

Signed-off-by: Rick Mesta <rick.mesta@rackspace.com>
Signed-off-by: Doug Goldstein <doug.goldstein@rackspace.com>
Reviewed-by: Doug Goldstein <doug.goldstein@rackspace.com>
@cardoe

This comment has been minimized.

Show comment
Hide comment
@cardoe

cardoe Apr 2, 2018

@BobBall Rick Mesta took this and updated it based on the feedback you provided and based on some internal feedback. He has validated it with XenServer 7.1.1 and shown me his test results that I have validated. Please consider this updated version for inclusion.

cardoe commented Apr 2, 2018

@BobBall Rick Mesta took this and updated it based on the feedback you provided and based on some internal feedback. He has validated it with XenServer 7.1.1 and shown me his test results that I have validated. Please consider this updated version for inclusion.

@cardoe cardoe changed the title from always rescan iSCSI when attaching to Delete LUN on detach of RawISCSI Apr 2, 2018

try:
(stdout,stderr) = exn_on_failure(cmd,failuremessage)
except:
raise xs_errors.XenError('ISCSIDelete')

This comment has been minimized.

@BobBall

BobBall Apr 3, 2018

Contributor

As this is trying to clean out a cache and is only(?) an issue on migration, should the delete failing always be considered fatal to the detach operation? I guess the alternative would be to log the failure - but I'm just a little nervous that there might be other scenarios where perhaps the delete does fail and we're now making detach fail in that case?

@BobBall

BobBall Apr 3, 2018

Contributor

As this is trying to clean out a cache and is only(?) an issue on migration, should the delete failing always be considered fatal to the detach operation? I guess the alternative would be to log the failure - but I'm just a little nervous that there might be other scenarios where perhaps the delete does fail and we're now making detach fail in that case?

This comment has been minimized.

@MarkSymsCtx

MarkSymsCtx Apr 3, 2018

Contributor

I think with a small refactoring we can make this entirely the problem of RawISCSISR and be 100% confident that it won't impact the code used by other SR types.

@MarkSymsCtx

MarkSymsCtx Apr 3, 2018

Contributor

I think with a small refactoring we can make this entirely the problem of RawISCSISR and be 100% confident that it won't impact the code used by other SR types.

This comment has been minimized.

@BobBall

BobBall Apr 3, 2018

Contributor

I'm not nervous about other SR types - I think the current code is perfectly safe for LVMoISCSI and friends. My concern is around RawISCSI use cases that don't involve migration and whether a failed delete should always be fatal

@BobBall

BobBall Apr 3, 2018

Contributor

I'm not nervous about other SR types - I think the current code is perfectly safe for LVMoISCSI and friends. My concern is around RawISCSI use cases that don't involve migration and whether a failed delete should always be fatal

@cardoe

This comment has been minimized.

Show comment
Hide comment
@cardoe

cardoe Apr 4, 2018

cardoe commented Apr 4, 2018

@BobBall

This comment has been minimized.

Show comment
Hide comment
@BobBall

BobBall Apr 6, 2018

Contributor

Actually, I think that's a fair point - the chances of a delete failing if the logout had succeeded is very slim.

Contributor

BobBall commented Apr 6, 2018

Actually, I think that's a fair point - the chances of a delete failing if the logout had succeeded is very slim.

@MarkSymsCtx

This comment has been minimized.

Show comment
Hide comment
@MarkSymsCtx

MarkSymsCtx Jun 25, 2018

Contributor

Merged externally in bcbb92b

Contributor

MarkSymsCtx commented Jun 25, 2018

Merged externally in bcbb92b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment