Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition #265

nickchen-cpu · 2021-12-22T09:49:08Z

Hi

Symptom:
datto_snap_cow0 thread usually hangs(consuming 100%) when snapshot-ing for /boot .

Linux distribution: Ubuntu 20.04
Linux Kernel: 5.4.0-91-generic
Disk layout: (8 x 3.8TB) Raid 5 as below

nvme1n1                      259:7    0  3.5T  0 disk
└─md126                        9:126  0 23.2T  0 raid5
  ├─md126p1                  259:8    0  512M  0 part  /boot/efi
  ├─md126p2                  259:9    0    1G  0 part  /boot
  └─md126p3                  259:10   0 23.2T  0 part
    ├─ubuntu--vg-ubuntu--lv  253:0    0  200G  0 lvm   /
    └─ubuntu--vg-bigdata--lv 253:1    0   22T  0 lvm   /bigdata

Memory layout:

              total        used        free      shared  buff/cache   available
Mem:          754Gi       2.4Gi       751Gi        17Mi       755Mi       748Gi
Swap:         8.0Gi          0B       8.0Gi

When I turned on the DEBUG mode
I found that the COW thread(datto_snap_cow0) went into endless loop in kernel function

bio_for_each_segment(bvec, bio, iter)

within

snap_handle_write_bio(const struct snap_device *dev, struct bio *bio)

or

snap_handle_read_bio(const struct snap_device *dev, struct bio *bio)

/ and /bigdata partition seemed to work well with snapshot mode,
Is there any way to investigate(debug) this symptom more deeply?

Not sure if it's the same as this issue.

Reproduce method 1(using write):

dbdctl setup-snapshot /dev/md126p2 /boot/.datto0 0
dd if=/dev/zero of=/boot/test.img bs=1M count=150

Reproduce method 2(using read):

dbdctl setup-snapshot /dev/md126p2 /boot/.datto0 0
dd if=/dev/datto0 of=/root/test.img

Best Regards,
Nick

The text was updated successfully, but these errors were encountered:

nickchen-cpu · 2021-12-22T11:05:46Z

One thing I think may help investigate is that
I 've tried to print the bio_iter_len(bio, iter) within snap_handle_write_bio(...)(or snap_handle_read_bio(...) ) when this issue happened:

bio_iter_len(bio, iter)  is 0, so strange

I will keep trying to print useful information for you.

Best Regards,
Nick

nixomose · 2021-12-22T12:28:50Z

bio_iter_len(bio, iter) is 0, so strange

I have some vague memory that you get that when the bio operation is a discard or secure erase or something like that.

I haven't looked at dattobd in a long time, but it's possible that there's some new bio operation layout in newer kernels that sends zero lengths that dattobd has never encountered before and isn't handling correctly.
Without looking, I can easily imagine it adding the length of zero and getting nowhere and looping forever.
probably just need to figure out what the new operation is and avoid/ignore it or handle it correctly.

yito24 · 2021-12-23T01:08:45Z

Hi nickchen-cpu

You may have encountered this issue.
Currently, the dattobd code cannot work on kernel 5.4 or later.
#252

I've reported this issue a long time ago, but it hasn't been fixed yet.

nickchen-cpu · 2021-12-23T02:39:44Z

Hi @nixomose
Thank you for replying!
Besides bio_iter_len(bio, iter) is 0, I also have other weird BIO information but I think I need to clarify it myself first.
So that the information given to you will be worthy.

Hi @yito24
Thank you for replying!
I've tried to clone elastio-snap lastest master , but the issue still happened

Best Regards,
Nick

nixomose · 2021-12-23T03:30:12Z

yeah this is probably the newer kernel passing new flags on the bio that it never did before.
I think the elastio-snap fork just fixes the new multi queue interface problems.

nickchen-cpu · 2021-12-24T07:10:05Z

Hi @nixomose

The following is in 5.4.0-91 kernel (my env)

enum req_opf {
	/* read sectors from the device */
	REQ_OP_READ		= 0,
	/* write sectors to the device */
	REQ_OP_WRITE		= 1,
	/* flush the volatile write cache */
	REQ_OP_FLUSH		= 2,
	/* discard sectors */
	REQ_OP_DISCARD		= 3,
	/* securely erase sectors */
	REQ_OP_SECURE_ERASE	= 5,
	/* reset a zone write pointer */
	REQ_OP_ZONE_RESET	= 6,
	/* write the same sector many times */
	REQ_OP_WRITE_SAME	= 7,
	/* reset all the zone present on the device */
	REQ_OP_ZONE_RESET_ALL	= 8,
	/* write the zero filled sector many times */
	REQ_OP_WRITE_ZEROES	= 9,

	/* SCSI passthrough using struct scsi_request */
	REQ_OP_SCSI_IN		= 32,
	REQ_OP_SCSI_OUT		= 33,
	/* Driver private requests */
	REQ_OP_DRV_IN		= 34,
	REQ_OP_DRV_OUT		= 35,

	REQ_OP_LAST,
};

I will try to see if there are behaviors more than

discard or secure erase

and should be discarded

dakotarwilliams · 2022-01-07T20:08:35Z

Just an update/haven't forgot about this, I'm looking into this to find a root cause.

At first I thought it might have to do with REQ_OP_DRV_{IN,OUT} conflicting with our DATTOBD_PASSTHROUGH flag since it requires another bit. NVMe makes use of these reqs, but they're still disjoint. The ops would use bits 29-24 and our flag is on bit 30... so that rules that out.

Similarly I looked into the weird world of zoning - your drives don't support zoning, zoned NVMe is part of the spec for NVMe 2.0 which barely has any hardware support at the time of this writing.

After some debugging with a similar drive array, I'm getting the same behavior on read ops, before even a single write has been requested. Doubly sure it's definitely not a zoning op problem (Not saying that zoning isn't a problem, it's just not this problem).

It's probably something to do with md, I'll keep digging.

nickchen-cpu · 2022-01-09T07:39:54Z

Hi @dakotarwilliams
Thanks for replying!

I agree with your point, because my other md-device also has the same issue.
By the way, all of my md-devices are using software raid(mdadm) , so hardware raid may or may not have this issue.

Not sure if this helps, but whenever this issue happened, the sequential read ops dd ended up reaching the same sector number.

After testing more times, I found / partition also had this issue(so, not only boot partition).

I appreciate for pointing to the problem caused by md ! I will try to help see if any clue out there!

2022/1/14 Testing Configuration:

Testing script:

root@nick:~# cat dattobd/app/test.sh
#!/bin/bash


for i in $(seq 1 1 100);
do
        echo $i
        ./dbdctl setup-snapshot -f 1500 /dev/md0p1 ~/cow 0
        dd if=/dev/zero of=/qq bs=1M count=1024 &
        dd if=/dev/datto0 of=/dev/null
        ./dbdctl destroy 0
done

(Note: COW thread hangs usually in the first loop in bad kernel version, retrying 100 times is just to see which kernel version is good enough.)

Testing disk layout:

root@nick:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda           8:0    0    8G  0 disk
├─sda1        8:1    0    1M  0 part
└─sda2        8:2    0    8G  0 part
  └─md0       9:0    0    8G  0 raid1
    └─md0p1 259:0    0    8G  0 md    /
sdb           8:16   0    8G  0 disk
└─sdb1        8:17   0    8G  0 part
  └─md0       9:0    0    8G  0 raid1
    └─md0p1 259:0    0    8G  0 md    /
sr0          11:0    1 1024M  0 rom

Testing mdstat:

root@nick:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda2[0] sdb1[1]
      8380416 blocks super 1.2 [2/2] [UU]

unused devices: <none>

2022/1/14 Testing Result in RAID1:

Good so far:
3.2.0-23-generic
4.4.0-186-generic
4.15.0-166-generic
5.0.0-050000-generic
5.0.0-65-generic
5.0.21-050021-generic

Failed:
5.1.0-050100rc1-generic
5.1.0-050100-generic
5.4.0-88-generic

I will keep narrowing down the issue between 5.0.21-050021 and 5.1.0-050100rc1 !

Best Regards,
Nick

nickchen-cpu · 2022-01-14T05:55:56Z

Hi @nixomose @dakotarwilliams
There are 2 information I think may help:

Since which kernel version this issue emerged:

After above tests, I am 99% sure that this issue emerged since 5.1.0-050100rc1
(Note: I only tested for Ubuntu so far, not sure if it's the same for vanilla)

Call Trace in specific kernel version

5.1.0-050100rc1 and 5.1.0-050100 sometimes(around 30% possibility) had the following Call Trace (but 5.4.0-88 didn't) when running my testing script, maybe it's other issue:

[ 2222.686278] general protection fault: 0000 [#1] SMP PTI
[ 2222.686841] CPU: 1 PID: 32312 Comm: datto_snap_cow0 Tainted: G           OE     5.1.0-050100rc1-generic #201903172231
[ 2222.687839] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 2222.688911] RIP: 0010:memcpy_orig+0x16/0x110
[ 2222.689440] Code: 1f 44 00 00 48 89 f8 48 89 d1 f3 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 4c 89 07
[ 2222.691119] RSP: 0018:ffffa6564239bb80 EFLAGS: 00010206
[ 2222.691695] RAX: ffff916a7d073000 RBX: 0000000000001000 RCX: 00000000fffffff2
[ 2222.694119] RDX: 0000000000000fc0 RSI: 04e1008c92daf000 RDI: ffff916a7d073000
[ 2222.694720] RBP: ffffa6564239bbb8 R08: 0000000000001000 R09: ffff916a7d073000
[ 2222.695331] R10: ffffa6564239bd18 R11: 0000000000000000 R12: 0000000000001000
[ 2222.695925] R13: ffffa6564239bd08 R14: 0000000000001000 R15: ffffffffb4244ac0
[ 2222.696512] FS:  0000000000000000(0000) GS:ffff916abfd00000(0000) knlGS:0000000000000000
[ 2222.697113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2222.697723] CR2: 00007fe4bb20d8c8 CR3: 0000000074f6c000 CR4: 00000000000006e0
[ 2222.698372] Call Trace:
[ 2222.699004]  ? iov_iter_copy_from_user_atomic+0x26e/0x380
[ 2222.699668]  generic_perform_write+0x116/0x1b0
[ 2222.700313]  __generic_file_write_iter+0x101/0x1f0
[ 2222.700966]  ext4_file_write_iter+0xcd/0x3f0
[ 2222.701639]  ? _cond_resched+0x19/0x30
[ 2222.702283]  new_sync_write+0x114/0x180
[ 2222.704346]  __vfs_write+0x29/0x40
[ 2222.704970]  vfs_write+0xab/0x1b0
[ 2222.705578]  kernel_write+0x31/0x50
[ 2222.706197]  file_io+0x41/0xe0 [dattobd]
[ 2222.706795]  snap_cow_thread+0x6a1/0xab0 [dattobd]
[ 2222.707403]  ? invalidate_bh_lrus+0x30/0x30
[ 2222.708011]  ? wait_woken+0x80/0x80
[ 2222.708626]  kthread+0x120/0x140
[ 2222.710082]  ? snap_read_bio_get_mode.isra.42+0x2a0/0x2a0 [dattobd]
[ 2222.710687]  ? __kthread_parkme+0x70/0x70
[ 2222.711294]  ret_from_fork+0x35/0x40
[ 2222.711905] Modules linked in: dattobd(OE) vmw_balloon joydev input_leds serio_raw mac_hid vmw_vsock_vmci_transport vsock vmw_vmci sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper psmouse mptspi vmwgfx ttm drm_kms_helper ahci libahci syscopyarea mptscsih sysfillrect mptbase sysimgblt fb_sys_fops scsi_transport_spi drm vmxnet3 i2c_piix4 pata_acpi
[ 2222.718535] ---[ end trace cdfd700d6afd5720 ]---
[ 2222.719169] RIP: 0010:memcpy_orig+0x16/0x110
[ 2222.719801] Code: 1f 44 00 00 48 89 f8 48 89 d1 f3 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 4c 89 07
[ 2222.723476] RSP: 0018:ffffa6564239bb80 EFLAGS: 00010206
[ 2222.724163] RAX: ffff916a7d073000 RBX: 0000000000001000 RCX: 00000000fffffff2
[ 2222.724876] RDX: 0000000000000fc0 RSI: 04e1008c92daf000 RDI: ffff916a7d073000
[ 2222.725556] RBP: ffffa6564239bbb8 R08: 0000000000001000 R09: ffff916a7d073000
[ 2222.726226] R10: ffffa6564239bd18 R11: 0000000000000000 R12: 0000000000001000
[ 2222.726901] R13: ffffa6564239bd08 R14: 0000000000001000 R15: ffffffffb4244ac0
[ 2222.727551] FS:  0000000000000000(0000) GS:ffff916abfd00000(0000) knlGS:0000000000000000
[ 2222.728233] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2222.728905] CR2: 00007fe4bb20d8c8 CR3: 0000000074f6c000 CR4: 00000000000006e0

Best Regards,
Nick

udaytessell · 2022-07-28T20:25:44Z

I am seeing the same issue of datto_snap_cow0 going into infinite loop, we have following situation

We have NVME, have two LV's created on the top

first LV, protected by datto
second LV, used for some other data store

When we transition into snapshot mode we see bio_iter_len returning 0...We do not see in incremental mode?

What is the work around for this

As mentioned by @nickchen-cpu above, should we have the single LV on the NVME and not sharing ?
will this solve the above problem
Is there any other workaround

nickchen-cpu changed the title ~~Snapshot for /boot hung on Raid 5 in Ubuntu 20.04~~ Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition Apr 28, 2022

Sir-Alex-L mentioned this issue Apr 28, 2023

Ubuntu 20.04 Memory Leak #294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition #265

Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition #265

nickchen-cpu commented Dec 22, 2021 •

edited

nickchen-cpu commented Dec 22, 2021 •

edited

nixomose commented Dec 22, 2021

yito24 commented Dec 23, 2021

nickchen-cpu commented Dec 23, 2021 •

edited

nixomose commented Dec 23, 2021

nickchen-cpu commented Dec 24, 2021 •

edited

dakotarwilliams commented Jan 7, 2022

nickchen-cpu commented Jan 9, 2022 •

edited

nickchen-cpu commented Jan 14, 2022 •

edited

udaytessell commented Jul 28, 2022

Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition #265

Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition #265

Comments

nickchen-cpu commented Dec 22, 2021 • edited

nickchen-cpu commented Dec 22, 2021 • edited

nixomose commented Dec 22, 2021

yito24 commented Dec 23, 2021

nickchen-cpu commented Dec 23, 2021 • edited

nixomose commented Dec 23, 2021

nickchen-cpu commented Dec 24, 2021 • edited

dakotarwilliams commented Jan 7, 2022

nickchen-cpu commented Jan 9, 2022 • edited

2022/1/14 Testing Configuration:

2022/1/14 Testing Result in RAID1:

nickchen-cpu commented Jan 14, 2022 • edited

Since which kernel version this issue emerged:

Call Trace in specific kernel version

udaytessell commented Jul 28, 2022

nickchen-cpu commented Dec 22, 2021 •

edited

nickchen-cpu commented Dec 22, 2021 •

edited

nickchen-cpu commented Dec 23, 2021 •

edited

nickchen-cpu commented Dec 24, 2021 •

edited

nickchen-cpu commented Jan 9, 2022 •

edited

nickchen-cpu commented Jan 14, 2022 •

edited