New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Oops during zfs send / recv on 3.17.4-gentoo #2946
Comments
I tested again with different kernel (3.15.10-gentoo) where the zfs-kmod did not contain the patches from zfs-kmod-0.6.3-r1.ebuild ... there the same action works. So I guess it's one of the zfs kernel module patches from 0.6.3-r1.ebuild. The userland was not changed between tests. Best regards,
|
I am on gentoo as well and I see something similar during a boot: [ 375.282116] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 Packages are the same as above: sys-fs/zfs-0.6.3-r2 Kernel is a different one: Linux cens-backup 3.16.5-gentoo #1 SMP Tue Nov 11 09:20:39 EET 2014 x86_64 Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz GenuineIntel GNU/Linux Best wishes, Marko |
And I can confirm that by masking /etc/portage/package.mask I can revert to the system that boots without Oops. Installed packages are sys-fs/zfs-0.6.3-r2 Marko |
Similar Oops here as well after upgrading to: sys-fs/zfs-kmod-0.6.3-r1 Version 0.6.3 with additional repository patches applied. Issue manifested after issuing zfs destroy. It didn't halt the system but degraded over time until a reboot was required. Couldn't restore the pool until I reverted kernel modules like Marko. Linux nas1 3.16.5-gentoo #1 SMP Mon Dec 8 15:39:19 MST 2014 x86_64 Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz GenuineIntel GNU/Linux gcc 4.8.3 [285511.517924] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 Let me know if I can provide any further info.
|
same error here with 3.14.25-hardened-r1 and 0.6.3-r1 while deleting a tree of filesystems recursively. I guess one of the backported patches is doing harm? I will retest with 0.6.3 and the same kernel. |
I was travelling when this issue was filed. These reports suggest that a regression might have occurred when I did my backports. It is possible that head is also affected given that the backports were strictly changes from head. I will look into this on the weekend. |
Anyone affected by this should downgrade to sys-kernel/spl-0.6.3, sys-fs/zfs-kmod-0.6.3 and sys-kernel/zfs-0.6.3 while I work on a fix. My apologies for the inconvenience. |
In my case this is only happening on a dual CPU machine. On my hexacore AMD single CPU system it is not happening. The workload is surely smaller on the second system. |
Any updates on this issue? This is a blocker for us to move forward to kernels > 3.17. 3.18.x is already 5 releases old and I want to skip now EOL'ed 3.17.x series and move to 3.18.x, but I can't because of this bug and compilation errors on 3.18.x. The kernel series that I am on right now (3.16.x) is also EOLed upstream at 3.16.7. So, I need to move on. Please help. |
@devsk can you verify if this is a regression in the ZoL master branch or just in the Gentoo packages. |
Installed the current masters: and running some tests...... hope it works |
Brian: do you suggest I update my setup to master? Is it reasonably stable? 0.6.3 is super stable for me and the only reason I have to move is to get current with supported kernel. I know Turbo argues for master being more stable than 0.6.3 but I want to hear it from you...:) Also, I know there is no guarantee of stability with any piece of software. It changes from install to install. So, may be this is all moot and I should just bite the bullet on master. As for your question, I only wanted to stay 0.6.3 and yes, there, the 0.6.3-r1 gentoo release with extra patches on top of 0.6.3 was unstable. |
|
I can confirm that the master is not stable on my system. After 10hours of excessive reading, the system became "slow". The load went through the roof over the 60s, services started to get non responsive. |
If Brian releases today, that version won't work for you either… Because you've already tried it (basically 'master' will be exactly like the 'released'). Or 'stable' as YOU call it. And this is why I really, REALLY don't like to use the word 'stable' and why it should NEVER be used (unless it's actually PROVEN to be stable over a course of several months)!! |
I fully agree. Nothing is really stable, but some versions of software are
|
Not at the time of release. 0.6.3 is just more tested NOW because more people But if you want 0.6.4 to work for you (and others), you should really do some more |
@alexanderhaensch : did you grab stacks of all the tasks when that happened? May be you ran into a known issue (like 3050). I am reluctant to try master right now because I am sort of very tight on time on my work side and haven't got the cycles to troubleshoot and debug. I feel guilty about it but there are only 24 hours in a day....:( |
Unfortunatly i didn't catch any stacks. |
@devsk my suggestion would be to stick with the released version for your distribution for your normal usage, particularly if it's stable for your workload. What I was trying to determine by having you try the master branch is if this is a Gentoo specific regression or something which is also in the master branch. Normally I'll assume that it's in master too until shown otherwise. But if I'm reading the comment thread correctly everyone who has reported this issue is running Gentoo and that's a little suspicious. So my question is, has anyone observed this on non-Gentoo platform? |
The problem seems to be in the patch The following patch should fix this problem in
|
1 similar comment
The problem seems to be in the patch The following patch should fix this problem in
|
@ryao are you still working on an update to the Gentoo packaging to avoid this (pretty serious) bug |
@ivecera Thanks. I will correct this in the next ebuild revision. |
@ryao i can report working with the 9999 ebuild for 2 weeks now. A new ebuild would be great! |
@ryao Likewise, I was about to move from Gentoo to ZFS on a better supported Linux, but the 9999 has worked well. I think it does ZFS a disservice on Gentoo to have a fundamental part of it broken for such a long time. Perhaps all users should be advised that only 9999 is usable at this time for recent kernels? |
@ari what distro is better supported? I was using 2 machines with ubuntu but after some problems with recent updates, i am testing Freebsd now :( |
Well, actually I'm a veteran (15 year and more) FreeBSD user. I have exactly one Linux box in all my server farms and I picked gentoo since it looked a little bit like FreeBSD :-) All this stuff "just works" on FreeBSD, partially because the ZFS implementation there is much older and more tested. It is also part of the core OS distribution so gets more eyes on it. I'm still struggling to get Gentoo booting from ZFS, but I hope to get there eventually do that I can create and destroy ZFS pools with version 9999. |
As a 'veteran' (since 2.6.3= 11years) gentoo user i would not encourage you to use a out of kernel file-system on your root/boot. The main issue is the licensing as you might know. |
"I don't want to brag" (well, I guess I do after all :), but On Debian GNU/Linux Wheezy, this have "just worked" for quite some time. WITH a dedicated, proper installer with ZFS support. And then, if Wheezy feels a little old, one could quite easily upgrade to Jessie, which was released yesterday… |
If you are running on freebsd I would recommend sending to freebsd-fs mail list. Just a comment, but I still haven't been able to install zol on centos after days of trying. Piece of cake on debian/ubuntu. Freebsd or omni, it's just there. |
As the discussion is fully of topic now, i would encourage @ari using the master branch. It works well at the moment. |
In fact, let me close this issue out entirely. This issue never impacted the upstream ZFS on Linux source code from Github. It was a Gentoo specific issue which can be tracked downstream with Gentoo. It will be resolved when the Gentoo repositories are updated. To avoid this sort of issue in the future I strongly suggest that the version of ZoL in Gentoo track the official upstream zfs-x.y.z-release branch. The sole intention of this branch will be to provide a current version of ZFS on Linux with only build fixes for new kernels, critical security fixes, and stability fixes. |
Thanks everyone for your help. I'd just like to confirm that version 9999 (which is hard masked) on Gentoo tracking the master branch, works perfectly for my use case. In particular I can now destroy pools. I advise all gentoo users finding this thread to try that ebuild. It would be nice if Gentoo tracked the stable tags rather than master, but this will do for now. Thanks |
Hi,
while trying to migrate one datapool to a different pool via zfs send / receive I get the following Oops:
[17649.786174] Oops: 0000 [#1] SMP
[17649.786202] Modules linked in: uas w83627ehf hwmon_vid ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat xt_CHECKSUM iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables bonding ftdi_sio stv6110x(O) lnbp21(O) zfs(PO) zunicode(PO) zavl(PO) zcommon(PO) znvpair(PO) spl(O) stv090x(O) x86_pkg_temp_thermal coretemp lpc_ich mfd_core ddbridge(O) cxd2099(O) dvb_core(O) fuse
[17649.786501] CPU: 1 PID: 8477 Comm: txg_sync Tainted: P O 3.17.4-gentoo #1
[17649.786543] Hardware name: /DQ77MK, BIOS MKQ7710H.86A.0064.2013.1003.1058 10/03/2013
[17649.786592] task: ffff88079cfe6d60 ti: ffff8807a1804000 task.ti: ffff8807a1804000
[17649.786659] RIP: 0010:[] [] zap_create_claim+0x4b/0x2d0 [zfs]
[17649.786728] RSP: 0018:ffff8807a1807b68 EFLAGS: 00010282
[17649.786762] RAX: 000000000000001d RBX: ffff8807de0ec800 RCX: 000000000000001e
[17649.786806] RDX: 000000000000001d RSI: 000000000000001c RDI: ffff8807de0ec800
[17649.786850] RBP: ffff8807a1807bd8 R08: 0000000000000000 R09: 0000000000000002
[17649.786894] R10: ffff8807a1807978 R11: ffffffffa00c6b71 R12: 000000000000001c
[17649.786937] R13: 000000000000001d R14: ffff88076474c000 R15: 0000000000000000
[17649.786981] FS: 0000000000000000(0000) GS:ffff88081e240000(0000) knlGS:0000000000000000
[17649.787031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17649.787067] CR2: 0000000000000018 CR3: 0000000002014000 CR4: 00000000001427e0
[17649.787110] Stack:
[17649.787126] ffff8803145677a0 ffff8807a1807c00 ffff880110ff3140 0000000000000000
[17649.787179] ffff8807a1807be8 ffffffffa0162804 ffff8801ad946720 0000000000000001
[17649.787233] ffff8807a1807bd8 ffff88076474c000 0000000000000000 ffff880358fa5240
[17649.787287] Call Trace:
[17649.787317] [] ? dmu_bonus_hold+0xe4/0x960 [zfs]
[17649.787375] [] spa_feature_decr+0x4b/0xc0 [zfs]
[17649.787426] [] ? bptree_is_empty+0x82/0x90 [zfs]
[17649.787488] [] dsl_scan_sync+0x7ea/0xa10 [zfs]
[17649.787524] [] ? spl_kmem_cache_free+0x11f/0x430 [spl]
[17649.787572] [] spa_sync+0x4e7/0xb30 [zfs]
[17649.787616] [] ? autoremove_wake_function+0x11/0x40
[17649.787658] [] ? __wake_up_common+0x55/0x90
[17649.787698] [] ? ktime_get_ts64+0x49/0xf0
[17649.787760] [] txg_sync_start+0x6be/0x8c0 [zfs]
[17649.787812] [] ? txg_sync_start+0x3b0/0x8c0 [zfs]
[17649.787861] [] spl_kmem_fini+0xa3/0xc0 [spl]
[17649.787900] [] ? spl_kmem_fini+0x30/0xc0 [spl]
[17649.787940] [] kthread+0xc4/0xe0
[17649.787973] [] ? kthread_worker_fn+0x100/0x100
[17649.788013] [] ret_from_fork+0x7c/0xb0
[17649.788049] [] ? kthread_worker_fn+0x100/0x100
[17649.788086] Code: 55 48 89 d0 48 89 e5 48 83 ec 70 48 89 5d d8 48 89 fb 4c 89 65 e0 49 89 f4 4c 89 6d e8 49 89 d5 4c 89 7d f8 4d 89 c7 4c 89 75 f0 <41> 8b 70 18 48 89 4d b8 b9 08 00 00 00 49 8b 50 08 44 89 4d ac
[17649.788334] RIP [] zap_create_claim+0x4b/0x2d0 [zfs]
[17649.788388] RSP
[17649.788411] CR2: 0000000000000018
[17649.799934] ---[ end trace e90bc72ab65cd506 ]---
The following versions are used:
sys-fs/zfs-0.6.3-r2
sys-fs/zfs-kmod-0.6.3-r1
sys-kernel/spl-0.6.3-r1
Kernel version is
Linux playstation 3.17.4-gentoo #1 SMP Tue Dec 2 14:05:11 CET 2014 x86_64 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz GenuineIntel GNU/Linux
Are there any further infos I can provide?
Thanks and best regards,
Jochen
The text was updated successfully, but these errors were encountered: