Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC at arc.c:5154:arc_read() when importing existing pool #5711

Closed
nakamuray opened this issue Jan 31, 2017 · 5 comments
Closed

PANIC at arc.c:5154:arc_read() when importing existing pool #5711

nakamuray opened this issue Jan 31, 2017 · 5 comments
Labels
Status: Inactive Not being actively updated

Comments

@nakamuray
Copy link

System information

Type Version/Name
Distribution Name exherbo
Distribution Version
Linux Kernel 4.9.6
Architecture x86_64
ZFS Version fa603f8
SPL Version 0.7.0-rc3

Describe the problem you're observing

zfs PANIC when importing existing pool which could be imported by older version.

Describe how to reproduce the problem

zpool import tank

Include any warning/errors/backtraces from the system logs

Jan 31 22:06:30 localhost kernel: VERIFY3(0 == arc_buf_alloc_impl(hdr, private, compressed_read, B_TRUE, &buf)) failed (0 =
= 0)
Jan 31 22:06:30 localhost kernel: PANIC at arc.c:5154:arc_read()
Jan 31 22:06:30 localhost kernel: Showing stack for process 2751
Jan 31 22:06:30 localhost kernel: CPU: 3 PID: 2751 Comm: txg_sync Tainted: P           O    4.9.6 #1
Jan 31 22:06:30 localhost kernel: Hardware name: System manufacturer System Product Name/F1A75-M PRO, BIOS 0802 07/01/2011
Jan 31 22:06:30 localhost kernel:  0000000000000000 ffffffff81276e0f ffffffffa143e47c ffffc9000d4cb380
Jan 31 22:06:30 localhost kernel:  ffffffffa04101c6 ffff8807f9187000 ffff880700000030 ffffc9000d4cb390
Jan 31 22:06:31 localhost kernel:  ffffc9000d4cb330 2833594649524556 637261203d3d2030 6c6c615f6675625f
Jan 31 22:06:31 localhost kernel: Call Trace:
Jan 31 22:06:31 localhost kernel:  [<ffffffff81276e0f>] ? dump_stack+0x46/0x67
Jan 31 22:06:31 localhost kernel:  [<ffffffffa04101c6>] ? spl_panic+0xb6/0xe0 [spl]
Jan 31 22:06:31 localhost kernel:  [<ffffffff81141e37>] ? kmem_cache_alloc+0xf7/0x160
Jan 31 22:06:31 localhost kernel:  [<ffffffffa040c7eb>] ? spl_kmem_cache_alloc+0x5b/0x730 [spl]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa130707d>] ? abd_return_buf+0x5d/0x90 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa130a302>] ? arc_get_data_impl.isra.15+0x1b2/0x380 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa130ac14>] ? arc_buf_fill+0x134/0x2a0 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa130bec1>] ? arc_read+0x8e1/0x910 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa130ec30>] ? arc_buf_destroy+0x100/0x100 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132ae30>] ? traverse_visitbp+0x4c0/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa130ba09>] ? arc_read+0x429/0x910 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132b85e>] ? traverse_dnode+0x8e/0x190 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132b05c>] ? traverse_visitbp+0x6ec/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132af0d>] ? traverse_visitbp+0x59d/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa130ba09>] ? arc_read+0x429/0x910 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132b85e>] ? traverse_dnode+0x8e/0x190 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132b1da>] ? traverse_visitbp+0x86a/0x990 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa138768a>] ? zap_lookup+0x2a/0x30 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132b4aa>] ? traverse_impl+0x1aa/0x3e0 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa131dbd5>] ? dmu_read_impl+0x105/0x130 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa13300ea>] ? dnode_rele_and_unlock+0x4a/0xa0 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa132bbae>] ? traverse_dataset_destroyed+0x2e/0x40 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa1318cd0>] ? dbuf_stats_destroy+0x50/0x50 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa131933c>] ? bptree_iterate+0x1bc/0x2d0 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa13484c0>] ? dsl_scan_zil_record+0x100/0x100 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa134b03c>] ? dsl_scan_sync+0x75c/0xbf0 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa135e623>] ? spa_sync+0x433/0xd00 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffff81086c8c>] ? __wake_up+0x3c/0x60
Jan 31 22:06:31 localhost kernel:  [<ffffffffa136e5dc>] ? txg_sync_thread+0x2cc/0x440 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa136e310>] ? txg_delay+0x160/0x160 [zfs]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa040d243>] ? thread_generic_wrapper+0x73/0x90 [spl]
Jan 31 22:06:31 localhost kernel:  [<ffffffffa040d1d0>] ? __thread_exit+0x10/0x10 [spl]
Jan 31 22:06:31 localhost kernel:  [<ffffffff8106a7f9>] ? kthread+0xb9/0xd0
Jan 31 22:06:31 localhost kernel:  [<ffffffff8106a740>] ? kthread_park+0x70/0x70
Jan 31 22:06:31 localhost kernel:  [<ffffffff8157faa2>] ? ret_from_fork+0x22/0x30
@kernelOfTruth
Copy link
Contributor

Any details about the pool?

What kernel was used before ?

What zfs/spl version was used before ?

What did change from "by older version" ?

@nakamuray
Copy link
Author

The pool was created some years ago and had been used since then.
I periodically updated to the latest stable kernel and git master branch of zfs/spl.
At least, it could be imported by kernel 4.8.15 and zfs c443487. (I'm going back to this version for the time being.)

@krichter722
Copy link

I'm seeing

[38045.780109] VERIFY3(0 == arc_buf_alloc_impl(hdr, private, compressed_read, B_TRUE, &buf)) failed (0 == 0)
[38045.780115] PANIC at arc.c:5194:arc_read()
[38045.780117] Showing stack for process 12518
[38045.780119] CPU: 1 PID: 12518 Comm: AioMgr0-F Tainted: P           OE   4.10.0-22-generic #24-Ubuntu
[38045.780120] Hardware name: LENOVO 20221/INVALID, BIOS 71CN51WW(V1.21) 07/12/2013
[38045.780121] Call Trace:
[38045.780128]  dump_stack+0x63/0x81
[38045.780139]  spl_dumpstack+0x42/0x50 [spl]
[38045.780145]  spl_panic+0xbb/0xf0 [spl]
[38045.780202]  ? buf_cons+0x6a/0x70 [zfs]
[38045.780208]  ? spl_kmem_cache_alloc+0x116/0x8d0 [spl]
[38045.780251]  ? zio_decompress_data+0x4c/0xa0 [zfs]
[38045.780285]  ? arc_buf_fill+0x169/0x2d0 [zfs]
[38045.780330]  ? zio_data_buf_alloc+0x55/0x60 [zfs]
[38045.780363]  arc_read+0x9ec/0xa20 [zfs]
[38045.780394]  ? dbuf_rele_and_unlock+0x27b/0x4c0 [zfs]
[38045.780426]  ? dbuf_rele_and_unlock+0x4c0/0x4c0 [zfs]
[38045.780458]  dbuf_read.part.13+0x84a/0x8f0 [zfs]
[38045.780491]  dbuf_read+0x1d/0x20 [zfs]
[38045.780528]  dmu_tx_check_ioerr+0x71/0xa0 [zfs]
[38045.780565]  dmu_tx_count_write+0xc5/0x190 [zfs]
[38045.780600]  dmu_tx_hold_write+0x41/0x60 [zfs]
[38045.780644]  zfs_write+0x5ed/0xd40 [zfs]
[38045.780649]  ? dequeue_task_fair+0x5ab/0xaa0
[38045.780651]  ? default_send_IPI_single+0x39/0x40
[38045.780654]  ? finish_wait+0x56/0x70
[38045.780697]  zpl_write_common_iovec+0x8c/0xe0 [zfs]
[38045.780739]  zpl_iter_write+0xb7/0xf0 [zfs]
[38045.780741]  new_sync_write+0xd5/0x130
[38045.780743]  __vfs_write+0x26/0x40
[38045.780744]  vfs_write+0xb5/0x1a0
[38045.780746]  SyS_write+0x55/0xc0
[38045.780749]  entry_SYSCALL_64_fastpath+0x1e/0xad
[38045.780750] RIP: 0033:0x7f59d0f4068d
[38045.780751] RSP: 002b:00007f595c1e2de0 EFLAGS: 00000293 ORIG_RAX: 0000000000000001
[38045.780753] RAX: ffffffffffffffda RBX: 00007f596c98d5c0 RCX: 00007f59d0f4068d
[38045.780754] RDX: 00000000000a8000 RSI: 00007f58d78f7000 RDI: 0000000000000033
[38045.780755] RBP: 00007f595c1e2e90 R08: 0000000000000000 R09: 0000000000000001
[38045.780756] R10: 00004bf89ca7bf82 R11: 0000000000000293 R12: 00007f592400e310
[38045.780757] R13: 0000000000000000 R14: 00007f595c1e2e54 R15: 00007f59240091c0

zfs-0.7.0-rc4-49-g82644107c4 with a 1 HDD + 1 SSD cache pool on Ubuntu 17.04 with Linux 4.10.0-22-generic which I naively assume is the same issue in a different context.

The issue occurs in conjunction with a freeze of the display manager and I experienced this issue some weeks ago where I wasn't able to capture the stacktrace or the ZFS/SPL version used and I updated in the meantime.

@corporategoth
Copy link

corporategoth commented Jan 26, 2018

I am getting this error, and I believe this (or it's cousins) is causing my computer to hard-lock on occasions (or possibly it could be a symptom of failing hardware, who knows?!)

I am not getting this on import, I usually see this on reboot (when the computer doesn't lock hard), so this is not import specific.

Linux temple 4.14.14-1-ARCH #1 SMP PREEMPT Fri Jan 19 18:42:04 UTC 2018 x86_64 GNU/Linux

[    1.431403] SPL: Loaded module v0.7.5-1
[    3.055628] ZFS: Loaded module v0.7.5-1, ZFS pool version 5000, ZFS filesystem version 5

Kernel Log:

Jan 24 22:35:30 temple kernel: VERIFY3(0 == arc_buf_alloc_impl(hdr, private, compressed_read, B_TRUE, &buf)) failed (0 == 0)
Jan 24 22:35:30 temple kernel: PANIC at arc.c:5245:arc_read()
Jan 24 22:35:30 temple kernel: Showing stack for process 12605
Jan 24 22:35:30 temple kernel: CPU: 1 PID: 12605 Comm: NetworkManager Tainted: P           O    4.14.14-1-ARCH #1
Jan 24 22:35:30 temple kernel: Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./990FXA-UD3, BIOS F3 05/28/2015
Jan 24 22:35:30 temple kernel: Call Trace:
Jan 24 22:35:30 temple kernel:  dump_stack+0x5c/0x85
Jan 24 22:35:30 temple kernel:  spl_panic+0xc8/0x110 [spl]
Jan 24 22:35:30 temple kernel:  ? dnode_setdirty+0x4d/0xf0 [zfs]
Jan 24 22:35:30 temple kernel:  ? getrawmonotonic64+0x3e/0xd0
Jan 24 22:35:30 temple kernel:  ? kmem_cache_alloc+0x94/0x1a0
Jan 24 22:35:30 temple kernel:  ? buf_cons+0x66/0x70 [zfs]
Jan 24 22:35:30 temple kernel:  ? spl_kmem_cache_alloc+0x10c/0x750 [spl]
Jan 24 22:35:30 temple kernel:  ? arc_get_data_impl.isra.26+0x18f/0x380 [zfs]
Jan 24 22:35:30 temple kernel:  arc_read+0xa1b/0xa70 [zfs]
Jan 24 22:35:30 temple kernel:  ? dbuf_rele_and_unlock+0x4a0/0x4a0 [zfs]
Jan 24 22:35:30 temple kernel:  dbuf_read+0x231/0x910 [zfs]
Jan 24 22:35:30 temple kernel:  ? spl_kmem_zalloc+0xc7/0x180 [spl]
Jan 24 22:35:30 temple kernel:  __dbuf_hold_impl+0x539/0x5f0 [zfs]
Jan 24 22:35:30 temple kernel:  dbuf_hold_impl+0x9b/0xc0 [zfs]
Jan 24 22:35:30 temple kernel:  dbuf_hold+0x2c/0x60 [zfs]
Jan 24 22:35:30 temple kernel:  dmu_buf_hold_array_by_dnode+0xda/0x470 [zfs]
Jan 24 22:35:30 temple kernel:  dmu_read_impl+0xa3/0x170 [zfs]
Jan 24 22:35:30 temple kernel:  dmu_read+0x56/0x90 [zfs]
Jan 24 22:35:30 temple kernel:  zfs_getpage+0xe4/0x180 [zfs]
Jan 24 22:35:30 temple kernel:  zpl_readpage+0x53/0xb0 [zfs]
Jan 24 22:35:30 temple kernel:  filemap_fault+0x37d/0x6a0
Jan 24 22:35:30 temple kernel:  ? filemap_map_pages+0x1a3/0x3a0
Jan 24 22:35:30 temple kernel:  __do_fault+0x20/0xd0
Jan 24 22:35:30 temple kernel:  __handle_mm_fault+0xcbf/0x1180
Jan 24 22:35:30 temple kernel:  handle_mm_fault+0xb1/0x1f0
Jan 24 22:35:30 temple kernel:  __do_page_fault+0x27f/0x530
Jan 24 22:35:30 temple kernel:  ? page_fault+0x36/0x60
Jan 24 22:35:30 temple kernel:  page_fault+0x4c/0x60
Jan 24 22:35:30 temple kernel: RIP: 0033:0x7f56369b69a0

ZPool Status:

  pool: fast
 state: ONLINE
  scan: scrub repaired 0B in 31h27m with 0 errors on Tue Jan 23 22:58:59 2018
config:

	NAME                                                     STATE     READ WRITE CKSUM
	fast                                                     ONLINE       0     0     0
	  mirror-0                                               ONLINE       0     0     0
	    ata-Samsung_SSD_850_EVO_120GB_S21TNXAG526443X-part1  ONLINE       0     0     0
	    ata-Samsung_SSD_850_EVO_120GB_S21TNXAG720086B-part1  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: none requested
config:

	NAME                                                     STATE     READ WRITE CKSUM
	tank                                                     ONLINE       0     0     0
	  raidz1-0                                               ONLINE       0     0     0
	    dm-name-tank01                                       ONLINE       0     0     0
	    dm-name-tank02                                       ONLINE       0     0     0
	    dm-name-tank03                                       ONLINE       0     0     0
	logs
	  mirror-1                                               ONLINE       0     0     0
	    ata-Samsung_SSD_850_EVO_120GB_S21TNXAG526443X-part4  ONLINE       0     0     0
	    ata-Samsung_SSD_850_EVO_120GB_S21TNXAG720086B-part4  ONLINE       0     0     0
	cache
	  ata-Samsung_SSD_850_EVO_120GB_S21TNXAG526443X-part5    ONLINE       0     0     0
	  ata-Samsung_SSD_850_EVO_120GB_S21TNXAG720086B-part5    ONLINE       0     0     0

errors: No known data errors

FYI, the tank drives are all full-disk encrypted (using cryptsetup). The SSDs are using the native SSD full-disk encryption (OPAL), so the OS does not need to be aware of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated
Projects
None yet
Development

No branches or pull requests

5 participants
@nakamuray @corporategoth @krichter722 @kernelOfTruth and others