fix: btrfs unmounting + e2e tests #879

jakobmoellerdev · 2024-04-09T12:09:11Z

fix #877
partially address #876

Adds table driven tests for the xfs scenarios and runs them for a btrfs storageClass as well. also adds a minimum size for btrfs

jakobmoellerdev · 2024-04-09T13:53:53Z

Looking at the test failures it seems the mount point is not set as expected with btrfs. Anyone has any input on that?

llamerada-jp · 2024-04-10T00:50:12Z

@cupnes
Is it the same situation as the following issue?
#877

cupnes · 2024-04-10T01:22:47Z

@llamerada-jp
I think the situation is different from #877. What is failing now is the pod deploy. However, #877 is an issue where the pod cannot be deleted.

jakobmoellerdev · 2024-04-10T07:36:10Z

@cupnes From what I can see in the test logs, the device verification fails: https://github.com/topolvm/topolvm/actions/runs/8615413549/job/23610904469?pr=879#step:9:570 Im not sure if this is really a failure as early as pod deployment.

cupnes · 2024-04-10T08:36:24Z

@jakobmoellerdev Yes. Please wait a moment while I investigate.

cupnes · 2024-04-11T08:11:35Z

@jakobmoellerdev
I changed DefaultMinimumAllocationSizeBtrfs to 110MiB and the problem area passed.

By my investigation, it seems that the minimum device size for btrfs changes dynamically [1]. So, I do not think it is necessary to support a strict minimum allocation size in this PR. We can just write it in the limitation.md for now, with another PR to deal with it if necessary.
[1] https://patchwork.kernel.org/project/linux-btrfs/patch/1403148338-6584-1-git-send-email-quwenruo@cn.fujitsu.com/

jakobmoellerdev · 2024-04-11T14:41:36Z

Im not sure I follow on this logic. What I understand is that this would make mkfs.btrfs fail. However it seems that the pod is successfully formatted. I would have expected that mkfs.btrfs would have bailed out during MountVolume in the CSI provisioning.

That being said, im fine with removing the limit, however Im not sure how we would test BTRFS then.

jakobmoellerdev · 2024-04-11T16:04:21Z

After adjusting the tests it seems that resizing behaves weirdly. The desired/actual storage for btrfs is off after resizing. Any ideas on what that is caused by?

cupnes · 2024-04-12T08:54:48Z

@jakobmoellerdev
There is a slight difference between file systems in terms of how much the actual file system size is for the size requested by PVC.
The fifth argument of Sprintf in this line was "topolvm-provisioner" yesterday. I think the "btrfs filesystem" test was done with xfs, which is why it didn't fail like this before.

jakobmoellerdev · 2024-04-15T10:17:54Z

seems that now the pod deletion in btrfs for offline resizing has an issue. On top of that the runtime of the tests is pretty much unbearable. We need to figure out how to fix that otherwise the tests will run 20minutes +

I really think the tests should be refactored...

jakobmoellerdev · 2024-04-15T11:43:09Z

Seems NodeUnpublishVolume for btrfs is broken due to "device or resource busy":

 {"level":"info","ts":"2024-04-15T11:41:37Z","logger":"driver.node","msg":"NodeUnpublishVolume called","volume_id":"52e43313-4f6b-4bdf-b3df-973f666ae983","target_path":"/var/lib/kubelet/pods/3fb3f6af-89f0-4fc4-8007-6b8263048f13/volum \u2502
\u2502 es/kubernetes.io~csi/pvc-411430a5-75d7-4aab-ae38-657b31fbe109/mount"}                                                                                                                                                                    \u2502
\u2502 {"level":"error","ts":"2024-04-15T11:41:37Z","msg":"error on grpc call","method":"/csi.v1.Node/NodeUnpublishVolume","error":"rpc error: code = Internal desc = remove dir failed for /var/lib/kubelet/pods/3fb3f6af-89f0-4fc4-8007-6b826 \u2502
\u2502 3048f13/volumes/kubernetes.io~csi/pvc-411430a5-75d7-4aab-ae38-657b31fbe109/mount: error=unlinkat /var/lib/kubelet/pods/3fb3f6af-89f0-4fc4-8007-6b8263048f13/volumes/kubernetes.io~csi/pvc-411430a5-75d7-4aab-ae38-657b31fbe109/mount: de \u2502
\u2502 vice or resource busy","stacktrace":"github.com/topolvm/topolvm/cmd/topolvm-node/app.ErrorLoggingInterceptor\n\t/home/jmoller/topolvm/cmd/topolvm-node/app/run.go:193\ngithub.com/container-storage-interface/spec/lib/go/csi._Node_Node \u2502
\u2502 UnpublishVolume_Handler\n\t/home/jmoller/go/pkg/mod/github.com/container-storage-interface/spec@v1.6.0/lib/go/csi/csi.pb.go:6168\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/home/jmoller/go/pkg/mod/google.golang.org/grpc@v1 \u2502
\u2502 .58.3/server.go:1374\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/home/jmoller/go/pkg/mod/google.golang.org/grpc@v1.58.3/server.go:1751\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.1\n\t/home/jmoller/go/pkg/mod/google. \u2502
\u2502 golang.org/grpc@v1.58.3/server.go:986"}

jakobmoellerdev · 2024-04-15T11:48:03Z

Seems the problem here is the umount call does not go through. When debugging into the node and running umount /var/lib/kubelet/pods/3fb3f6af-89f0-4fc4-8007-6b8263048f13/volumes/kubernetes.io~csi/pvc-411430a5-75d7-4aab-ae38-657b31fbe109/mount manually then the test recovers

jakobmoellerdev · 2024-04-15T11:50:19Z

Seems like we have a device mapper / mounting issue again. The reason this is not recovering is because for btrfs the following /proc/mounts exists and lets this check think we do not have a mount:

/dev/dm-3 /var/lib/kubelet/pods/2be71706-9a37-41fe-91c2-f21220f5b1be/volumes/kubernetes.io~csi/pvc-411430a5-75d7-4aab-ae38-657b31fbe109/mount btrfs rw,relatime,discard=async,space_cache=v2,subvolid=5,subvol=/ 0 0

as you can see this is under /dev/dm-3, I assume this is the same root cause as #877

jakobmoellerdev · 2024-04-18T05:43:12Z

@toshipp @cupnes In light of my recent investigation I have found that we are still using /proc/mounts instead of /proc/1/mountinfo (which would also include Maj,Min and some other neat features). Also when looking through the code I do not see why we would need our own unmount logic if there is a well written unmounter in mount-utils. Im sure there was a reason but could you explain this to me again? If we dont really have any reason to keep this around I would vote for changing this to the upstream unmounting process

toshipp · 2024-04-18T06:47:34Z

@jakobmoellerdev
mount-util has not been exported during the early development stage of topolvm. We replaced our mount logic with it but missed unmount part. So, I have no objection to using it.

I checked mountinfo and found that the major number is not accurate for lvm devices. It reports 0 but should be 253.

Signed-off-by: Jakob Möller <jmoller@redhat.com>

cupnes

I will comment on the points I noticed. The review is not yet complete and will continue tomorrow.

internal/driver/node.go

internal/filesystem/util.go

cmd/topolvm-controller/app/root.go

test/e2e/manifests/values/base.yaml

Signed-off-by: Jakob Möller <jmoller@redhat.com>

jakobmoellerdev · 2024-04-23T06:59:38Z

I was able to reduce the test runtime of one filesystem by about 2 minutes now thanks to some missing --now flags on deleting the pods which caused a 30 period grace wait time. The reason we are now taking quite a bit of time on the resize tests is actually related to the check with df in the pod. TBH I have no idea why its taking so long and I will have to do some tracing on where its slow but im assuming it has to do with the async nature of CSI because the volume expansion on lvm was extremely quick.

cupnes

Thank you. LGTM

jakobmoellerdev · 2024-04-23T16:32:58Z

I also want to give an update on my long running but interesting investigations on the long duration of the resizing and the impact on our test suite (6+ minutes).

This is in fact caused by ControllerExpandVolume always specifying needing NodeExpandVolume

topolvm/internal/driver/controller.go

Line 690 in a967c95

NodeExpansionRequired: true,

.

Why does this matter? Because NodeExpandVolume is not immediately called, but instead queued by the kubelets Volume Manager. When investigating I found that we are in fact not delaying the call in TopoLVM at all, rather the kubelet is not calling NodeExpandVolume for minutes, causing the test to stall until the kubelet invokes the CSI call. When investigating deeper I found that https://github.com/kubernetes/enhancements/blob/0e4d5df19d396511fe41ed0860b0ab9b96f46a2d/keps/sig-storage/556-csi-volume-resizing/README.md#expansion-on-kubelet is the design reason for this. However note that in theory, volume manager should update every 100ms, which for some reason does not happen. I know that this is where its stuck because of the PVCs conditions while waiting in the df eventually check:

│   conditions:                                                                                                                                                │
│   - lastProbeTime: null                                                                                                                                      │
│     lastTransitionTime: "2024-04-23T14:45:33Z"                                                                                                               │
│     message: Waiting for user to (re-)start a pod to finish file system resize of                                                                            │
│       volume on node.                                                                                                                                        │
│     status: "True"                                                                                                                                           │
│     type: FileSystemResizePending

I still need to verify if there is not an issue in the kubelet that causes delays which we dont want but this is definetly the reason we see the delay, nothing in TopoLVM is working while we are in this state so there is nothing we can do to speed it up.

How could we fix this without fixing the kubelet delay? We always specify NodeExpandVolume because NodeExpandVolume is our only Code where we resize the Filesystem. This is because we do not implement NodeStageVolume and NodeUnstageVolume which could implement the mounting and unmounting seperately from NodePublish/NodeUnpublishvolume which are only run when the pod is scheduled.

Now this is important because changing this would allow us to run the filesystem resize in the lv controllers resize method, without needing to call NodeExpansionRequired true and thus we would be able to significantly speed up mounting and resizing behavior.

cupnes · 2024-04-24T08:42:49Z

@jakobmoellerdev
I understood. But I was just wondering is that correct as a CSI specification? I think NodeStageVolume and NodeUnstageVolume are for mounting and unmounting volumes.

jakobmoellerdev · 2024-04-24T12:31:39Z

I think that the main behavior of StageVolume is to prepare it for use by pods. There is no mention of this being intended just for mount/umount in https://github.com/container-storage-interface/spec/blob/master/spec.md#nodestagevolume afaik. Also Ive seen drivers where Stage/Unstage does the necessary work such as activation of volumes and then publish does the actual mount with read or readwrite.

jakobmoellerdev · 2024-04-25T06:07:50Z

Okay this investigation aside, I believe this PR can get merged without issues. @llamerada-jp any concerns?

cupnes · 2024-04-25T08:11:27Z

@jakobmoellerdev
On the other hand, I think it is a good to report to Kubernetes that the kubelet is not calling NodeExpandVolume for minutes.

jakobmoellerdev · 2024-04-25T08:33:22Z

Agreed, I will take care to put this to the k8s community. Just as a follow-up, in https://github.com/topolvm/topolvm/actions/runs/8829012081/job/24239136537?pr=898#step:9:621 you can see that when I implement a Controller based Resize without NodeExtendVolume, then the test is as fast as the others, further solidifying my findings

llamerada-jp · 2024-04-26T00:45:06Z

Sorry for the delay in reviewing this, I have been busy with other tasks. Good for the most part, and I'm still checking on the details.

jakobmoellerdev · 2024-05-02T07:49:57Z

I just noticed this got merged without a squash. Maybe we should introduce a Github Action to stop merges if there is not one squashed commit?

jakobmoellerdev requested a review from a team as a code owner April 9, 2024 12:09

jakobmoellerdev requested review from llamerada-jp and cupnes April 9, 2024 12:09

jakobmoellerdev force-pushed the btrfs-testing branch from ddc11c8 to 8b0b028 Compare April 11, 2024 15:40

jakobmoellerdev force-pushed the btrfs-testing branch 2 times, most recently from 6f5f7b7 to 8321902 Compare April 15, 2024 09:41

jakobmoellerdev mentioned this pull request Apr 15, 2024

topolvm-node does not unmount btrfs volumes after running parted -l #877

Closed

jakobmoellerdev force-pushed the btrfs-testing branch 7 times, most recently from 65998e5 to f4c2c4d Compare April 17, 2024 16:33

jakobmoellerdev force-pushed the btrfs-testing branch from f4c2c4d to d57f8fc Compare April 18, 2024 05:46

jakobmoellerdev added 4 commits April 22, 2024 10:38

chore: increase btrfs min size in an attempt to fix tests

2f4a85a

Signed-off-by: Jakob Möller <jmoller@redhat.com>

fix: refactor IsMounted to allow device mapper checks for btrfs

5d559ba

Signed-off-by: Jakob Möller <jmoller@redhat.com>

fix: replace unmount with CleanupMountPoint

5b22f0d

Signed-off-by: Jakob Möller <jmoller@redhat.com>

fix: switch to IsMountPoint

a067506

Signed-off-by: Jakob Möller <jmoller@redhat.com>

jakobmoellerdev force-pushed the btrfs-testing branch from 456e4f7 to a067506 Compare April 22, 2024 08:38

fix: fixup pvc capacity check after linting

5cafc7a

Signed-off-by: Jakob Möller <jmoller@redhat.com>

cupnes reviewed Apr 22, 2024

View reviewed changes

internal/driver/node.go Show resolved Hide resolved

internal/filesystem/util.go Show resolved Hide resolved

llamerada-jp reviewed Apr 22, 2024

View reviewed changes

cmd/topolvm-controller/app/root.go Outdated Show resolved Hide resolved

test/e2e/manifests/values/base.yaml Show resolved Hide resolved

jakobmoellerdev added 3 commits April 22, 2024 13:54

chore: remove copyright comment in util after removal

48dada8

Signed-off-by: Jakob Möller <jmoller@redhat.com>

chore: adjust DefaultMinimumAllocationSizeBtrfs comment

0327dd7

Signed-off-by: Jakob Möller <jmoller@redhat.com>

chore: fixup table comments and add btrfs snapshot tests

2e3d119

Signed-off-by: Jakob Möller <jmoller@redhat.com>

jakobmoellerdev force-pushed the btrfs-testing branch from 7834a1f to e51791d Compare April 22, 2024 15:28

chore: speed up pod deletion in resize tests

f6f4b9d

Signed-off-by: Jakob Möller <jmoller@redhat.com>

jakobmoellerdev force-pushed the btrfs-testing branch 3 times, most recently from 19be4e0 to f6f4b9d Compare April 22, 2024 17:05

cupnes approved these changes Apr 23, 2024

View reviewed changes

jakobmoellerdev requested a review from llamerada-jp April 25, 2024 06:42

jakobmoellerdev mentioned this pull request Apr 25, 2024

TopoLVM should not use the /dev/topolvm devices but use lvm2 lv output as source of truth #902

Open

llamerada-jp approved these changes May 1, 2024

View reviewed changes

llamerada-jp merged commit 9a916b2 into topolvm:main May 1, 2024
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: btrfs unmounting + e2e tests #879

fix: btrfs unmounting + e2e tests #879

jakobmoellerdev commented Apr 9, 2024 •

edited

jakobmoellerdev commented Apr 9, 2024

llamerada-jp commented Apr 10, 2024 •

edited

cupnes commented Apr 10, 2024

jakobmoellerdev commented Apr 10, 2024

cupnes commented Apr 10, 2024

cupnes commented Apr 11, 2024

jakobmoellerdev commented Apr 11, 2024

jakobmoellerdev commented Apr 11, 2024

cupnes commented Apr 12, 2024

jakobmoellerdev commented Apr 15, 2024

jakobmoellerdev commented Apr 15, 2024

jakobmoellerdev commented Apr 15, 2024

jakobmoellerdev commented Apr 15, 2024 •

edited

jakobmoellerdev commented Apr 18, 2024

toshipp commented Apr 18, 2024

cupnes left a comment

jakobmoellerdev commented Apr 23, 2024

cupnes left a comment

jakobmoellerdev commented Apr 23, 2024

cupnes commented Apr 24, 2024

jakobmoellerdev commented Apr 24, 2024

jakobmoellerdev commented Apr 25, 2024

cupnes commented Apr 25, 2024

jakobmoellerdev commented Apr 25, 2024

llamerada-jp commented Apr 26, 2024

jakobmoellerdev commented May 2, 2024

fix: btrfs unmounting + e2e tests #879

fix: btrfs unmounting + e2e tests #879

Conversation

jakobmoellerdev commented Apr 9, 2024 • edited

jakobmoellerdev commented Apr 9, 2024

llamerada-jp commented Apr 10, 2024 • edited

cupnes commented Apr 10, 2024

jakobmoellerdev commented Apr 10, 2024

cupnes commented Apr 10, 2024

cupnes commented Apr 11, 2024

jakobmoellerdev commented Apr 11, 2024

jakobmoellerdev commented Apr 11, 2024

cupnes commented Apr 12, 2024

jakobmoellerdev commented Apr 15, 2024

jakobmoellerdev commented Apr 15, 2024

jakobmoellerdev commented Apr 15, 2024

jakobmoellerdev commented Apr 15, 2024 • edited

jakobmoellerdev commented Apr 18, 2024

toshipp commented Apr 18, 2024

cupnes left a comment

Choose a reason for hiding this comment

jakobmoellerdev commented Apr 23, 2024

cupnes left a comment

Choose a reason for hiding this comment

jakobmoellerdev commented Apr 23, 2024

cupnes commented Apr 24, 2024

jakobmoellerdev commented Apr 24, 2024

jakobmoellerdev commented Apr 25, 2024

cupnes commented Apr 25, 2024

jakobmoellerdev commented Apr 25, 2024

llamerada-jp commented Apr 26, 2024

jakobmoellerdev commented May 2, 2024

jakobmoellerdev commented Apr 9, 2024 •

edited

llamerada-jp commented Apr 10, 2024 •

edited

jakobmoellerdev commented Apr 15, 2024 •

edited