Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition #265

Open
nickchen-cpu opened this issue Dec 22, 2021 · 10 comments
Open

Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition #265

nickchen-cpu opened this issue Dec 22, 2021 · 10 comments

Comments

@nickchen-cpu
Copy link

nickchen-cpu commented Dec 22, 2021

Hi

Symptom:
datto_snap_cow0 thread usually hangs(consuming 100%) when snapshot-ing for /boot .

Linux distribution: Ubuntu 20.04
Linux Kernel: 5.4.0-91-generic
Disk layout: (8 x 3.8TB) Raid 5 as below

image

nvme1n1                      259:7    0  3.5T  0 disk
└─md126                        9:126  0 23.2T  0 raid5
  ├─md126p1                  259:8    0  512M  0 part  /boot/efi
  ├─md126p2                  259:9    0    1G  0 part  /boot
  └─md126p3                  259:10   0 23.2T  0 part
    ├─ubuntu--vg-ubuntu--lv  253:0    0  200G  0 lvm   /
    └─ubuntu--vg-bigdata--lv 253:1    0   22T  0 lvm   /bigdata

Memory layout:

              total        used        free      shared  buff/cache   available
Mem:          754Gi       2.4Gi       751Gi        17Mi       755Mi       748Gi
Swap:         8.0Gi          0B       8.0Gi

When I turned on the DEBUG mode
I found that the COW thread(datto_snap_cow0) went into endless loop in kernel function

bio_for_each_segment(bvec, bio, iter)

within

snap_handle_write_bio(const struct snap_device *dev, struct bio *bio)

or

snap_handle_read_bio(const struct snap_device *dev, struct bio *bio)

/ and /bigdata partition seemed to work well with snapshot mode,
Is there any way to investigate(debug) this symptom more deeply?

Not sure if it's the same as this issue.

Reproduce method 1(using write):

dbdctl setup-snapshot /dev/md126p2 /boot/.datto0 0
dd if=/dev/zero of=/boot/test.img bs=1M count=150

Reproduce method 2(using read):

dbdctl setup-snapshot /dev/md126p2 /boot/.datto0 0
dd if=/dev/datto0 of=/root/test.img

Best Regards,
Nick

@nickchen-cpu
Copy link
Author

nickchen-cpu commented Dec 22, 2021

One thing I think may help investigate is that
I 've tried to print the bio_iter_len(bio, iter) within snap_handle_write_bio(...)(or snap_handle_read_bio(...) ) when this issue happened:

bio_iter_len(bio, iter)  is 0, so strange

I will keep trying to print useful information for you.

Best Regards,
Nick

@nixomose
Copy link
Contributor

bio_iter_len(bio, iter) is 0, so strange

I have some vague memory that you get that when the bio operation is a discard or secure erase or something like that.

I haven't looked at dattobd in a long time, but it's possible that there's some new bio operation layout in newer kernels that sends zero lengths that dattobd has never encountered before and isn't handling correctly.
Without looking, I can easily imagine it adding the length of zero and getting nowhere and looping forever.
probably just need to figure out what the new operation is and avoid/ignore it or handle it correctly.

@yito24
Copy link

yito24 commented Dec 23, 2021

Hi nickchen-cpu

You may have encountered this issue.
Currently, the dattobd code cannot work on kernel 5.4 or later.
#252

I've reported this issue a long time ago, but it hasn't been fixed yet.

@nickchen-cpu
Copy link
Author

nickchen-cpu commented Dec 23, 2021

Hi @nixomose
Thank you for replying!
Besides bio_iter_len(bio, iter) is 0, I also have other weird BIO information but I think I need to clarify it myself first.
So that the information given to you will be worthy.

Hi @yito24
Thank you for replying!
I've tried to clone elastio-snap lastest master , but the issue still happened

Best Regards,
Nick

@nixomose
Copy link
Contributor

yeah this is probably the newer kernel passing new flags on the bio that it never did before.
I think the elastio-snap fork just fixes the new multi queue interface problems.

@nickchen-cpu
Copy link
Author

nickchen-cpu commented Dec 24, 2021

Hi @nixomose

The following is in 5.4.0-91 kernel (my env)

enum req_opf {
	/* read sectors from the device */
	REQ_OP_READ		= 0,
	/* write sectors to the device */
	REQ_OP_WRITE		= 1,
	/* flush the volatile write cache */
	REQ_OP_FLUSH		= 2,
	/* discard sectors */
	REQ_OP_DISCARD		= 3,
	/* securely erase sectors */
	REQ_OP_SECURE_ERASE	= 5,
	/* reset a zone write pointer */
	REQ_OP_ZONE_RESET	= 6,
	/* write the same sector many times */
	REQ_OP_WRITE_SAME	= 7,
	/* reset all the zone present on the device */
	REQ_OP_ZONE_RESET_ALL	= 8,
	/* write the zero filled sector many times */
	REQ_OP_WRITE_ZEROES	= 9,

	/* SCSI passthrough using struct scsi_request */
	REQ_OP_SCSI_IN		= 32,
	REQ_OP_SCSI_OUT		= 33,
	/* Driver private requests */
	REQ_OP_DRV_IN		= 34,
	REQ_OP_DRV_OUT		= 35,

	REQ_OP_LAST,
};

I will try to see if there are behaviors more than

discard or secure erase

and should be discarded

@dakotarwilliams
Copy link
Contributor

Just an update/haven't forgot about this, I'm looking into this to find a root cause.

At first I thought it might have to do with REQ_OP_DRV_{IN,OUT} conflicting with our DATTOBD_PASSTHROUGH flag since it requires another bit. NVMe makes use of these reqs, but they're still disjoint. The ops would use bits 29-24 and our flag is on bit 30... so that rules that out.

Similarly I looked into the weird world of zoning - your drives don't support zoning, zoned NVMe is part of the spec for NVMe 2.0 which barely has any hardware support at the time of this writing.

After some debugging with a similar drive array, I'm getting the same behavior on read ops, before even a single write has been requested. Doubly sure it's definitely not a zoning op problem (Not saying that zoning isn't a problem, it's just not this problem).

It's probably something to do with md, I'll keep digging.

@nickchen-cpu
Copy link
Author

nickchen-cpu commented Jan 9, 2022

Hi @dakotarwilliams
Thanks for replying!

I agree with your point, because my other md-device also has the same issue.
By the way, all of my md-devices are using software raid(mdadm) , so hardware raid may or may not have this issue.

Not sure if this helps, but whenever this issue happened, the sequential read ops dd ended up reaching the same sector number.

After testing more times, I found / partition also had this issue(so, not only boot partition).

I appreciate for pointing to the problem caused by md ! I will try to help see if any clue out there!

2022/1/14 Testing Configuration:

Testing script:

root@nick:~# cat dattobd/app/test.sh
#!/bin/bash


for i in $(seq 1 1 100);
do
        echo $i
        ./dbdctl setup-snapshot -f 1500 /dev/md0p1 ~/cow 0
        dd if=/dev/zero of=/qq bs=1M count=1024 &
        dd if=/dev/datto0 of=/dev/null
        ./dbdctl destroy 0
done

(Note: COW thread hangs usually in the first loop in bad kernel version, retrying 100 times is just to see which kernel version is good enough.)

Testing disk layout:

root@nick:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINT
sda           8:0    0    8G  0 disk
├─sda1        8:1    0    1M  0 part
└─sda2        8:2    0    8G  0 part
  └─md0       9:0    0    8G  0 raid1
    └─md0p1 259:0    0    8G  0 md    /
sdb           8:16   0    8G  0 disk
└─sdb1        8:17   0    8G  0 part
  └─md0       9:0    0    8G  0 raid1
    └─md0p1 259:0    0    8G  0 md    /
sr0          11:0    1 1024M  0 rom

Testing mdstat:

root@nick:~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sda2[0] sdb1[1]
      8380416 blocks super 1.2 [2/2] [UU]

unused devices: <none>

2022/1/14 Testing Result in RAID1:

Good so far:
3.2.0-23-generic
4.4.0-186-generic
4.15.0-166-generic
5.0.0-050000-generic
5.0.0-65-generic
5.0.21-050021-generic

Failed:
5.1.0-050100rc1-generic
5.1.0-050100-generic
5.4.0-88-generic

I will keep narrowing down the issue between 5.0.21-050021 and 5.1.0-050100rc1 !

Best Regards,
Nick

@nickchen-cpu
Copy link
Author

nickchen-cpu commented Jan 14, 2022

Hi @nixomose @dakotarwilliams
There are 2 information I think may help:

Since which kernel version this issue emerged:

After above tests, I am 99% sure that this issue emerged since 5.1.0-050100rc1
(Note: I only tested for Ubuntu so far, not sure if it's the same for vanilla)

Call Trace in specific kernel version

5.1.0-050100rc1 and 5.1.0-050100 sometimes(around 30% possibility) had the following Call Trace (but 5.4.0-88 didn't) when running my testing script, maybe it's other issue:

[ 2222.686278] general protection fault: 0000 [#1] SMP PTI
[ 2222.686841] CPU: 1 PID: 32312 Comm: datto_snap_cow0 Tainted: G           OE     5.1.0-050100rc1-generic #201903172231
[ 2222.687839] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 2222.688911] RIP: 0010:memcpy_orig+0x16/0x110
[ 2222.689440] Code: 1f 44 00 00 48 89 f8 48 89 d1 f3 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 4c 89 07
[ 2222.691119] RSP: 0018:ffffa6564239bb80 EFLAGS: 00010206
[ 2222.691695] RAX: ffff916a7d073000 RBX: 0000000000001000 RCX: 00000000fffffff2
[ 2222.694119] RDX: 0000000000000fc0 RSI: 04e1008c92daf000 RDI: ffff916a7d073000
[ 2222.694720] RBP: ffffa6564239bbb8 R08: 0000000000001000 R09: ffff916a7d073000
[ 2222.695331] R10: ffffa6564239bd18 R11: 0000000000000000 R12: 0000000000001000
[ 2222.695925] R13: ffffa6564239bd08 R14: 0000000000001000 R15: ffffffffb4244ac0
[ 2222.696512] FS:  0000000000000000(0000) GS:ffff916abfd00000(0000) knlGS:0000000000000000
[ 2222.697113] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2222.697723] CR2: 00007fe4bb20d8c8 CR3: 0000000074f6c000 CR4: 00000000000006e0
[ 2222.698372] Call Trace:
[ 2222.699004]  ? iov_iter_copy_from_user_atomic+0x26e/0x380
[ 2222.699668]  generic_perform_write+0x116/0x1b0
[ 2222.700313]  __generic_file_write_iter+0x101/0x1f0
[ 2222.700966]  ext4_file_write_iter+0xcd/0x3f0
[ 2222.701639]  ? _cond_resched+0x19/0x30
[ 2222.702283]  new_sync_write+0x114/0x180
[ 2222.704346]  __vfs_write+0x29/0x40
[ 2222.704970]  vfs_write+0xab/0x1b0
[ 2222.705578]  kernel_write+0x31/0x50
[ 2222.706197]  file_io+0x41/0xe0 [dattobd]
[ 2222.706795]  snap_cow_thread+0x6a1/0xab0 [dattobd]
[ 2222.707403]  ? invalidate_bh_lrus+0x30/0x30
[ 2222.708011]  ? wait_woken+0x80/0x80
[ 2222.708626]  kthread+0x120/0x140
[ 2222.710082]  ? snap_read_bio_get_mode.isra.42+0x2a0/0x2a0 [dattobd]
[ 2222.710687]  ? __kthread_parkme+0x70/0x70
[ 2222.711294]  ret_from_fork+0x35/0x40
[ 2222.711905] Modules linked in: dattobd(OE) vmw_balloon joydev input_leds serio_raw mac_hid vmw_vsock_vmci_transport vsock vmw_vmci sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear raid1 hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper psmouse mptspi vmwgfx ttm drm_kms_helper ahci libahci syscopyarea mptscsih sysfillrect mptbase sysimgblt fb_sys_fops scsi_transport_spi drm vmxnet3 i2c_piix4 pata_acpi
[ 2222.718535] ---[ end trace cdfd700d6afd5720 ]---
[ 2222.719169] RIP: 0010:memcpy_orig+0x16/0x110
[ 2222.719801] Code: 1f 44 00 00 48 89 f8 48 89 d1 f3 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 4c 89 07
[ 2222.723476] RSP: 0018:ffffa6564239bb80 EFLAGS: 00010206
[ 2222.724163] RAX: ffff916a7d073000 RBX: 0000000000001000 RCX: 00000000fffffff2
[ 2222.724876] RDX: 0000000000000fc0 RSI: 04e1008c92daf000 RDI: ffff916a7d073000
[ 2222.725556] RBP: ffffa6564239bbb8 R08: 0000000000001000 R09: ffff916a7d073000
[ 2222.726226] R10: ffffa6564239bd18 R11: 0000000000000000 R12: 0000000000001000
[ 2222.726901] R13: ffffa6564239bd08 R14: 0000000000001000 R15: ffffffffb4244ac0
[ 2222.727551] FS:  0000000000000000(0000) GS:ffff916abfd00000(0000) knlGS:0000000000000000
[ 2222.728233] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2222.728905] CR2: 00007fe4bb20d8c8 CR3: 0000000074f6c000 CR4: 00000000000006e0

Best Regards,
Nick

@nickchen-cpu nickchen-cpu changed the title Snapshot for /boot hung on Raid 5 in Ubuntu 20.04 Kernel 5.1 has datto_snap_cow0 100% for Raid1 or Raid5 partition Apr 28, 2022
@udaytessell
Copy link

I am seeing the same issue of datto_snap_cow0 going into infinite loop, we have following situation

We have NVME, have two LV's created on the top

  1. first LV, protected by datto
  2. second LV, used for some other data store

When we transition into snapshot mode we see bio_iter_len returning 0...We do not see in incremental mode?

What is the work around for this

  • As mentioned by @nickchen-cpu above, should we have the single LV on the NVME and not sharing ?
    will this solve the above problem
  • Is there any other workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants