-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition #265
Comments
One thing I think may help investigate is that
I will keep trying to print useful information for you. Best Regards, |
bio_iter_len(bio, iter) is 0, so strange I have some vague memory that you get that when the bio operation is a discard or secure erase or something like that. I haven't looked at dattobd in a long time, but it's possible that there's some new bio operation layout in newer kernels that sends zero lengths that dattobd has never encountered before and isn't handling correctly. |
Hi nickchen-cpu You may have encountered this issue. I've reported this issue a long time ago, but it hasn't been fixed yet. |
Hi @nixomose Hi @yito24 Best Regards, |
yeah this is probably the newer kernel passing new flags on the bio that it never did before. |
Hi @nixomose The following is in 5.4.0-91 kernel (my env)
I will try to see if there are behaviors more than
and should be discarded |
Just an update/haven't forgot about this, I'm looking into this to find a root cause. At first I thought it might have to do with REQ_OP_DRV_{IN,OUT} conflicting with our DATTOBD_PASSTHROUGH flag since it requires another bit. NVMe makes use of these reqs, but they're still disjoint. The ops would use bits 29-24 and our flag is on bit 30... so that rules that out. Similarly I looked into the weird world of zoning - your drives don't support zoning, zoned NVMe is part of the spec for NVMe 2.0 which barely has any hardware support at the time of this writing. After some debugging with a similar drive array, I'm getting the same behavior on read ops, before even a single write has been requested. Doubly sure it's definitely not a zoning op problem (Not saying that zoning isn't a problem, it's just not this problem). It's probably something to do with md, I'll keep digging. |
Hi @dakotarwilliams I agree with your point, because my other Not sure if this helps, but whenever this issue happened, the sequential read ops After testing more times, I found I appreciate for pointing to the problem caused by 2022/1/14 Testing Configuration:Testing script:
(Note: COW thread hangs usually in the first loop in bad kernel version, retrying 100 times is just to see which kernel version is good enough.) Testing disk layout:
Testing mdstat:
2022/1/14 Testing Result in RAID1:Good so far: Failed: I will keep narrowing down the issue between 5.0.21-050021 and 5.1.0-050100rc1 ! Best Regards, |
Hi @nixomose @dakotarwilliams Since which kernel version this issue emerged:After above tests, I am 99% sure that this issue emerged since 5.1.0-050100rc1 Call Trace in specific kernel version5.1.0-050100rc1 and 5.1.0-050100 sometimes(around 30% possibility) had the following Call Trace (but 5.4.0-88 didn't) when running my testing script, maybe it's other issue:
Best Regards, |
I am seeing the same issue of datto_snap_cow0 going into infinite loop, we have following situation We have NVME, have two LV's created on the top
When we transition into snapshot mode we see bio_iter_len returning 0...We do not see in incremental mode? What is the work around for this
|
Hi
Symptom:
datto_snap_cow0
thread usually hangs(consuming 100%) when snapshot-ing for /boot .Linux distribution: Ubuntu 20.04
Linux Kernel: 5.4.0-91-generic
Disk layout: (8 x 3.8TB) Raid 5 as below
Memory layout:
When I turned on the DEBUG mode
I found that the COW thread(datto_snap_cow0) went into endless loop in kernel function
within
or
/ and /bigdata partition seemed to work well with snapshot mode,
Is there any way to investigate(debug) this symptom more deeply?
Not sure if it's the same as this issue.
Reproduce method 1(using write):
Reproduce method 2(using read):
Best Regards,
Nick
The text was updated successfully, but these errors were encountered: