Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race in dnode_check_slots_free() #8005

Closed
wants to merge 1 commit into from

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

Backport of edc1e71 for the 0.7 release branch. See issue #7997.

Description

Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.

This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.

How Has This Been Tested?

This same issue was observed in master and fix was integrated in April,
this is a clean backport of that change. I've built it locally but I'm pushing
it to the bots for additional testing.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the contributing document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

Currently, dnode_check_slots_free() works by checking dn->dn_type
in the dnode to determine if the dnode is reclaimable. However,
there is a small window of time between dnode_free_sync() in the
first call to dsl_dataset_sync() and when the useraccounting code
is run when the type is set DMU_OT_NONE, but the dnode is not yet
evictable, leading to crashes. This patch adds the ability for
dnodes to track which txg they were last dirtied in and adds a
check for this before performing the reclaim.

This patch also corrects several instances when dn_dirty_link was
treated as a list_node_t when it is technically a multilist_node_t.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Requires-spl: spl-0.7-release
Issue openzfs#7147
Issue openzfs#7388
Issue openzfs#7997
@codecov
Copy link

codecov bot commented Oct 10, 2018

Codecov Report

Merging #8005 into zfs-0.7-release will increase coverage by 0.4%.
The diff coverage is 95.23%.

@@                Coverage Diff                 @@
##           zfs-0.7-release    #8005     +/-   ##
==================================================
+ Coverage            72.49%   72.89%   +0.4%     
==================================================
  Files                  289      289             
  Lines                89767    89780     +13     
==================================================
+ Hits                 65077    65448    +371     
+ Misses               24690    24332    -358

@tonyhutter tonyhutter added this to To do in 0.7.12 Oct 10, 2018
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 10, 2018
Copy link

@janlam7 janlam7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seem to work for me. Without this patch my glusterfs nodes stop responding within an hour after booting. With this patch no hangs for over a day now. Nice

@behlendorf
Copy link
Contributor Author

Closing. Will be included in 0.7.12.

@behlendorf behlendorf closed this Oct 24, 2018
@tonyhutter tonyhutter moved this from To do to In progress in 0.7.12 Oct 26, 2018
@blind-oracle
Copy link

Don't see this issue in the release notes of 0.7.12
Should I apply it on top manually?

@bunder2015
Copy link
Contributor

This patch was included via commit b32f127 which is included in the 0.7.12 branch.

@blind-oracle
Copy link

@bunder2015 Thanks!

@philwo
Copy link

philwo commented May 18, 2019

I still see this happening with Ubuntu 19.04 and the included zfs v0.7.12-1ubuntu5. Is there any known workaround for this?

[37240.129894] general protection fault: 0000 [#1] SMP PTI
[37240.135331] CPU: 22 PID: 1728 Comm: dp_sync_taskq Tainted: P           O      5.0.0-1006-gcp #6-Ubuntu
[37240.144760] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[37240.154172] RIP: 0010:multilist_sublist_remove+0x11/0x40 [zfs]
[37240.160135] Code: 89 70 08 48 89 06 48 89 56 08 48 89 32 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 03 77 38 55 48 8b 46 08 48 8b 16 <48> 89 42 08 48 89 e5 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89
[37240.179025] RSP: 0000:ffffbc52d866bd10 EFLAGS: 00010286
[37240.184373] RAX: ffff9ff00b1da9c0 RBX: ffff9ff0f1919000 RCX: 0000000000000000
[37240.191623] RDX: dead000000000100 RSI: ffff9fef1ee3aeb8 RDI: ffff9ff00b1da980
[37240.198896] RBP: ffffbc52d866bdd8 R08: ffffbc52d866bbc3 R09: 0000000000000000
[37240.206153] R10: ffffffff9345dea0 R11: ffffbc52d866bcdf R12: ffff9fef1ee3aee8
[37240.213408] R13: ffff9fef1ee3af08 R14: ffff9ff00b1da980 R15: ffff9fef1ee3adf0
[37240.220685] FS:  0000000000000000(0000) GS:ffff9ff207980000(0000) knlGS:0000000000000000
[37240.228900] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37240.234778] CR2: 0000000591e93068 CR3: 0000000f9575c004 CR4: 00000000001606e0
[37240.242042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37240.249300] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[37240.256560] Call Trace:
[37240.259184]  ? userquota_updates_task+0x107/0x540 [zfs]
[37240.264587]  ? dmu_objset_userobjspace_upgradable+0x60/0x60 [zfs]
[37240.270855]  ? dmu_objset_userobjspace_upgradable+0x60/0x60 [zfs]
[37240.277082]  taskq_thread+0x2ec/0x4d0 [spl]
[37240.281388]  ? wake_up_q+0x80/0x80
[37240.284914]  kthread+0x120/0x140
[37240.288259]  ? task_done+0xb0/0xb0 [spl]
[37240.292289]  ? __kthread_parkme+0x70/0x70
[37240.296409]  ret_from_fork+0x35/0x40
[37240.300104] Modules linked in: ipt_MASQUERADE xfrm_user xfrm_algo iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter bpfilter xt_conntrack nf_nat br_netfilter bridge stp llc nfsv3 nfs_acl aufs nf_tables_set nf_tables rpcsec_gss_krb5 auth_rpcgss overlay nfsv4 nfs lockd grace fscache zfs(PO) zunicode(PO) zavl(PO) icp(PO) nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc zcommon(PO) scsi_dh_alua znvpair(PO) spl(O) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd virtio_net glue_helper net_failover input_leds nvme psmouse failover nvme_core serio_raw sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables
[37240.365824] ---[ end trace 53c80689fe3d8421 ]---
[37240.370625] RIP: 0010:multilist_sublist_remove+0x11/0x40 [zfs]
May 18 20:22:25 [37240.376610] Code: 89 70 08 48 89 06 48 89 56 08 48 89 32 5d c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 03 77 38 55 48 8b 46 08 48 8b 16 <48> 89 42 08 48 89 e5 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89
[37240.396897] RSP: 0000:ffffbc52d866bd10 EFLAGS: 00010286
bk-docker-c4t5 k[37240.402240] RAX: ffff9ff00b1da9c0 RBX: ffff9ff0f1919000 RCX: 0000000000000000
[37240.410903] RDX: dead000000000100 RSI: ffff9fef1ee3aeb8 RDI: ffff9ff00b1da980
ernel: [37240.12[37240.418164] RBP: ffffbc52d866bdd8 R08: ffffbc52d866bbc3 R09: 0000000000000000
9894] general pr[37240.426851] R10: ffffffff9345dea0 R11: ffffbc52d866bcdf R12: ffff9fef1ee3aee8
otection fault: [37240.435506] R13: ffff9fef1ee3af08 R14: ffff9ff00b1da980 R15: ffff9fef1ee3adf0
[37240.444150] FS:  0000000000000000(0000) GS:ffff9ff207980000(0000) knlGS:0000000000000000
0000 [#1] SMP PT[37240.452368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
I
May 18 20:22:[37240.459619] CR2: 0000000591e93068 CR3: 0000000f9575c004 CR4: 00000000001606e0
25 bk-docker-c4t[37240.468258] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
5 kernel: [37240[37240.476896] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
.135331] CPU: 22[37240.485534] Kernel panic - not syncing: Fatal exception
 PID: 1728 Comm:[37240.492695] Kernel Offset: 0x11400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
No open projects
0.7.12
  
In progress
Development

Successfully merging this pull request may close these issues.

None yet

6 participants