-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: btrfs unmounting + e2e tests #879
Conversation
Looking at the test failures it seems the mount point is not set as expected with btrfs. Anyone has any input on that? |
@llamerada-jp |
@cupnes From what I can see in the test logs, the device verification fails: https://github.com/topolvm/topolvm/actions/runs/8615413549/job/23610904469?pr=879#step:9:570 Im not sure if this is really a failure as early as pod deployment. |
@jakobmoellerdev Yes. Please wait a moment while I investigate. |
@jakobmoellerdev By my investigation, it seems that the minimum device size for btrfs changes dynamically [1]. So, I do not think it is necessary to support a strict minimum allocation size in this PR. We can just write it in the limitation.md for now, with another PR to deal with it if necessary. |
Im not sure I follow on this logic. What I understand is that this would make mkfs.btrfs fail. However it seems that the pod is successfully formatted. I would have expected that mkfs.btrfs would have bailed out during MountVolume in the CSI provisioning. That being said, im fine with removing the limit, however Im not sure how we would test BTRFS then. |
ddc11c8
to
8b0b028
Compare
After adjusting the tests it seems that resizing behaves weirdly. The desired/actual storage for btrfs is off after resizing. Any ideas on what that is caused by? |
@jakobmoellerdev |
6f5f7b7
to
8321902
Compare
seems that now the pod deletion in btrfs for offline resizing has an issue. On top of that the runtime of the tests is pretty much unbearable. We need to figure out how to fix that otherwise the tests will run 20minutes + I really think the tests should be refactored... |
Seems NodeUnpublishVolume for btrfs is broken due to "device or resource busy":
|
Seems the problem here is the |
Seems like we have a device mapper / mounting issue again. The reason this is not recovering is because for btrfs the following
as you can see this is under |
65998e5
to
f4c2c4d
Compare
@toshipp @cupnes In light of my recent investigation I have found that we are still using /proc/mounts instead of /proc/1/mountinfo (which would also include Maj,Min and some other neat features). Also when looking through the code I do not see why we would need our own unmount logic if there is a well written unmounter in mount-utils. Im sure there was a reason but could you explain this to me again? If we dont really have any reason to keep this around I would vote for changing this to the upstream unmounting process |
f4c2c4d
to
d57f8fc
Compare
@jakobmoellerdev I checked mountinfo and found that the major number is not accurate for lvm devices. It reports 0 but should be 253. |
Signed-off-by: Jakob Möller <jmoller@redhat.com>
Signed-off-by: Jakob Möller <jmoller@redhat.com>
Signed-off-by: Jakob Möller <jmoller@redhat.com>
Signed-off-by: Jakob Möller <jmoller@redhat.com>
456e4f7
to
a067506
Compare
Signed-off-by: Jakob Möller <jmoller@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will comment on the points I noticed. The review is not yet complete and will continue tomorrow.
Signed-off-by: Jakob Möller <jmoller@redhat.com>
Signed-off-by: Jakob Möller <jmoller@redhat.com>
Signed-off-by: Jakob Möller <jmoller@redhat.com>
7834a1f
to
e51791d
Compare
Signed-off-by: Jakob Möller <jmoller@redhat.com>
19be4e0
to
f6f4b9d
Compare
I was able to reduce the test runtime of one filesystem by about 2 minutes now thanks to some missing --now flags on deleting the pods which caused a 30 period grace wait time. The reason we are now taking quite a bit of time on the resize tests is actually related to the check with df in the pod. TBH I have no idea why its taking so long and I will have to do some tracing on where its slow but im assuming it has to do with the async nature of CSI because the volume expansion on lvm was extremely quick. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. LGTM
I also want to give an update on my long running but interesting investigations on the long duration of the resizing and the impact on our test suite (6+ minutes). This is in fact caused by ControllerExpandVolume always specifying needing NodeExpandVolume topolvm/internal/driver/controller.go Line 690 in a967c95
Why does this matter? Because NodeExpandVolume is not immediately called, but instead queued by the kubelets Volume Manager. When investigating I found that we are in fact not delaying the call in TopoLVM at all, rather the kubelet is not calling NodeExpandVolume for minutes, causing the test to stall until the kubelet invokes the CSI call. When investigating deeper I found that https://github.com/kubernetes/enhancements/blob/0e4d5df19d396511fe41ed0860b0ab9b96f46a2d/keps/sig-storage/556-csi-volume-resizing/README.md#expansion-on-kubelet is the design reason for this. However note that in theory, volume manager should update every 100ms, which for some reason does not happen. I know that this is where its stuck because of the PVCs conditions while waiting in the df eventually check:
I still need to verify if there is not an issue in the kubelet that causes delays which we dont want but this is definetly the reason we see the delay, nothing in TopoLVM is working while we are in this state so there is nothing we can do to speed it up. How could we fix this without fixing the kubelet delay? We always specify NodeExpandVolume because NodeExpandVolume is our only Code where we resize the Filesystem. This is because we do not implement NodeStageVolume and NodeUnstageVolume which could implement the mounting and unmounting seperately from NodePublish/NodeUnpublishvolume which are only run when the pod is scheduled. Now this is important because changing this would allow us to run the filesystem resize in the lv controllers resize method, without needing to call NodeExpansionRequired true and thus we would be able to significantly speed up mounting and resizing behavior. |
@jakobmoellerdev |
I think that the main behavior of StageVolume is to prepare it for use by pods. There is no mention of this being intended just for mount/umount in https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume afaik. Also Ive seen drivers where Stage/Unstage does the necessary work such as activation of volumes and then publish does the actual mount with read or readwrite. |
Okay this investigation aside, I believe this PR can get merged without issues. @llamerada-jp any concerns? |
@jakobmoellerdev |
Agreed, I will take care to put this to the k8s community. Just as a follow-up, in https://github.com/topolvm/topolvm/actions/runs/8829012081/job/24239136537?pr=898#step:9:621 you can see that when I implement a Controller based Resize without NodeExtendVolume, then the test is as fast as the others, further solidifying my findings |
Sorry for the delay in reviewing this, I have been busy with other tasks. Good for the most part, and I'm still checking on the details. |
I just noticed this got merged without a squash. Maybe we should introduce a Github Action to stop merges if there is not one squashed commit? |
fix #877
partially address #876
Adds table driven tests for the xfs scenarios and runs them for a btrfs storageClass as well. also adds a minimum size for btrfs