Fix race in dnode_check_slots_free() #8005

behlendorf · 2018-10-10T01:28:24Z

Motivation and Context

Backport of edc1e71 for the 0.7 release branch. See issue #7997.

Description

Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.

This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.

How Has This Been Tested?

This same issue was observed in master and fix was integrated in April,
this is a clean backport of that change. I've built it locally but I'm pushing
it to the bots for additional testing.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain Signed-off-by.
Change has been approved by a ZFS on Linux member.

Currently, dnode_check_slots_free() works by checking dn->dn_type in the dnode to determine if the dnode is reclaimable. However, there is a small window of time between dnode_free_sync() in the first call to dsl_dataset_sync() and when the useraccounting code is run when the type is set DMU_OT_NONE, but the dnode is not yet evictable, leading to crashes. This patch adds the ability for dnodes to track which txg they were last dirtied in and adds a check for this before performing the reclaim. This patch also corrects several instances when dn_dirty_link was treated as a list_node_t when it is technically a multilist_node_t. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tom Caputi <tcaputi@datto.com> Requires-spl: spl-0.7-release Issue openzfs#7147 Issue openzfs#7388 Issue openzfs#7997

codecov · 2018-10-10T09:31:16Z

Codecov Report

Merging #8005 into zfs-0.7-release will increase coverage by 0.4%.
The diff coverage is 95.23%.

@@                Coverage Diff                 @@
##           zfs-0.7-release    #8005     +/-   ##
==================================================
+ Coverage            72.49%   72.89%   +0.4%     
==================================================
  Files                  289      289             
  Lines                89767    89780     +13     
==================================================
+ Hits                 65077    65448    +371     
+ Misses               24690    24332    -358

janlam7

Seem to work for me. Without this patch my glusterfs nodes stop responding within an hour after booting. With this patch no hangs for over a day now. Nice

behlendorf · 2018-10-24T21:51:44Z

Closing. Will be included in 0.7.12.

blind-oracle · 2018-11-16T07:58:54Z

Don't see this issue in the release notes of 0.7.12
Should I apply it on top manually?

bunder2015 · 2018-11-16T08:20:25Z

This patch was included via commit b32f127 which is included in the 0.7.12 branch.

blind-oracle · 2018-11-16T08:28:28Z

@bunder2015 Thanks!

philwo · 2019-05-18T21:11:10Z

I still see this happening with Ubuntu 19.04 and the included zfs v0.7.12-1ubuntu5. Is there any known workaround for this?

[37240.129894] general protection fault: 0000 [#1] SMP PTI
[37240.135331] CPU: 22 PID: 1728 Comm: dp_sync_taskq Tainted: P           O      5.0.0-1006-gcp #6-Ubuntu
[37240.144760] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[37240.154172] RIP: 0010:multilist_sublist_remove+0x11/0x40 [zfs]
[37240.160135] Code: 89 70 08 48 89 06 48 89 56 08 48 89 32 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 03 77 38 55 48 8b 46 08 48 8b 16 <48> 89 42 08 48 89 e5 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89
[37240.179025] RSP: 0000:ffffbc52d866bd10 EFLAGS: 00010286
[37240.184373] RAX: ffff9ff00b1da9c0 RBX: ffff9ff0f1919000 RCX: 0000000000000000
[37240.191623] RDX: dead000000000100 RSI: ffff9fef1ee3aeb8 RDI: ffff9ff00b1da980
[37240.198896] RBP: ffffbc52d866bdd8 R08: ffffbc52d866bbc3 R09: 0000000000000000
[37240.206153] R10: ffffffff9345dea0 R11: ffffbc52d866bcdf R12: ffff9fef1ee3aee8
[37240.213408] R13: ffff9fef1ee3af08 R14: ffff9ff00b1da980 R15: ffff9fef1ee3adf0
[37240.220685] FS:  0000000000000000(0000) GS:ffff9ff207980000(0000) knlGS:0000000000000000
[37240.228900] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37240.234778] CR2: 0000000591e93068 CR3: 0000000f9575c004 CR4: 00000000001606e0
[37240.242042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37240.249300] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[37240.256560] Call Trace:
[37240.259184]  ? userquota_updates_task+0x107/0x540 [zfs]
[37240.264587]  ? dmu_objset_userobjspace_upgradable+0x60/0x60 [zfs]
[37240.270855]  ? dmu_objset_userobjspace_upgradable+0x60/0x60 [zfs]
[37240.277082]  taskq_thread+0x2ec/0x4d0 [spl]
[37240.281388]  ? wake_up_q+0x80/0x80
[37240.284914]  kthread+0x120/0x140
[37240.288259]  ? task_done+0xb0/0xb0 [spl]
[37240.292289]  ? __kthread_parkme+0x70/0x70
[37240.296409]  ret_from_fork+0x35/0x40
[37240.300104] Modules linked in: ipt_MASQUERADE xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat br_netfilter bridge stp llc nfsv3 nfs_acl aufs nf_tables_set nf_tables rpcsec_gss_krb5 auth_rpcgss overlay nfsv4 nfs lockd grace fscache zfs(PO) zunicode(PO) zavl(PO) icp(PO) nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc zcommon(PO) scsi_dh_alua znvpair(PO) spl(O) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd virtio_net glue_helper net_failover input_leds nvme psmouse failover nvme_core serio_raw sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables
[37240.365824] ---[ end trace 53c80689fe3d8421 ]---
[37240.370625] RIP: 0010:multilist_sublist_remove+0x11/0x40 [zfs]
May 18 20:22:25 [37240.376610] Code: 89 70 08 48 89 06 48 89 56 08 48 89 32 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 03 77 38 55 48 8b 46 08 48 8b 16 <48> 89 42 08 48 89 e5 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89
[37240.396897] RSP: 0000:ffffbc52d866bd10 EFLAGS: 00010286
bk-docker-c4t5 k[37240.402240] RAX: ffff9ff00b1da9c0 RBX: ffff9ff0f1919000 RCX: 0000000000000000
[37240.410903] RDX: dead000000000100 RSI: ffff9fef1ee3aeb8 RDI: ffff9ff00b1da980
ernel: [37240.12[37240.418164] RBP: ffffbc52d866bdd8 R08: ffffbc52d866bbc3 R09: 0000000000000000
9894] general pr[37240.426851] R10: ffffffff9345dea0 R11: ffffbc52d866bcdf R12: ffff9fef1ee3aee8
otection fault: [37240.435506] R13: ffff9fef1ee3af08 R14: ffff9ff00b1da980 R15: ffff9fef1ee3adf0
[37240.444150] FS:  0000000000000000(0000) GS:ffff9ff207980000(0000) knlGS:0000000000000000
0000 [#1] SMP PT[37240.452368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
I
May 18 20:22:[37240.459619] CR2: 0000000591e93068 CR3: 0000000f9575c004 CR4: 00000000001606e0
25 bk-docker-c4t[37240.468258] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
5 kernel: [37240[37240.476896] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
.135331] CPU: 22[37240.485534] Kernel panic - not syncing: Fatal exception
 PID: 1728 Comm:[37240.492695] Kernel Offset: 0x11400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 10, 2018

behlendorf mentioned this pull request Oct 10, 2018

list_del corruption. next->prev should be XXX, but was dead000000000200 and crash in userquota_updates_task #7997

Closed

behlendorf requested a review from tonyhutter October 10, 2018 01:37

tonyhutter approved these changes Oct 10, 2018

View reviewed changes

tonyhutter added this to To do in 0.7.12 Oct 10, 2018

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 10, 2018

tonyhutter mentioned this pull request Oct 10, 2018

list_del corruption with latest Centos 7 kernel and 0.7.11-1 #7933

Closed

janlam7 reviewed Oct 12, 2018

View reviewed changes

behlendorf mentioned this pull request Oct 12, 2018

High CPU usage of dbuf_evict for small random I/O #7922

Closed

rigred mentioned this pull request Oct 20, 2018

General protection fault - userquota_updates_task #8048

Closed

behlendorf closed this Oct 24, 2018

tonyhutter moved this from To do to In progress in 0.7.12 Oct 26, 2018

DrDaveD mentioned this pull request Jul 19, 2019

list_del corruption with latest Centos 7 kernel and 0.7.13-1 #9068

Closed

behlendorf deleted the issue-7997 branch April 19, 2021 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race in dnode_check_slots_free() #8005

Fix race in dnode_check_slots_free() #8005

behlendorf commented Oct 10, 2018

codecov bot commented Oct 10, 2018 •

edited

janlam7 left a comment

behlendorf commented Oct 24, 2018

blind-oracle commented Nov 16, 2018

bunder2015 commented Nov 16, 2018

blind-oracle commented Nov 16, 2018

philwo commented May 18, 2019

Fix race in dnode_check_slots_free() #8005

Fix race in dnode_check_slots_free() #8005

Conversation

behlendorf commented Oct 10, 2018

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

codecov bot commented Oct 10, 2018 • edited

Codecov Report

janlam7 left a comment

Choose a reason for hiding this comment

behlendorf commented Oct 24, 2018

blind-oracle commented Nov 16, 2018

bunder2015 commented Nov 16, 2018

blind-oracle commented Nov 16, 2018

philwo commented May 18, 2019

codecov bot commented Oct 10, 2018 •

edited