0.6.5.6 - I/O timeout during disk spin up #4638

Open
flo82 opened this Issue May 12, 2016 · 10 comments

Comments

Projects
None yet
6 participants
@flo82

flo82 commented May 12, 2016

see #3856 for further details. Bug is still present:

i'm still experiencing this bug with 0.6.5.6 on Ubuntu 16.04. with same chipset (SAS2008).

This is what i did:
created a zpool on the device and sent it to sleep (hdparm -y). Then started writing a file to it. Result was:
[59526.359997] sd 0:0:1:0: [sda] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[59526.360003] sd 0:0:1:0: [sda] CDB:
[59526.360006] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00
[59526.360022] blk_update_request: I/O error, dev sda, sector 824769380

i created a ext3 filesystem on the device and sent it to sleep. Then started the fily copy again. Result: No messages in dmesg.

I also checked the original file with the copied one - they are identical. So this bug has to do with ZFS and is not closed.Any Ideas? Need further Information @behlendorf ?

@hoppel118

This comment has been minimized.

Show comment
Hide comment
@hoppel118

hoppel118 Oct 6, 2016

Hey guys,

I also see this error for my pool. I also only see this error in combination with my zfs hdds in the syslog. It always happens, when my hdds have to wake up after a spindown (127). There are no errors in the hdd smart informations.

Oct  6 07:42:11 omv kernel: [21313.092935] sd 1:0:1:0: [sdc] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.092939] sd 1:0:1:0: [sdc] tag#2 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.092941] blk_update_request: I/O error, dev sdc, sector 3645713904
Oct  6 07:42:11 omv kernel: [21313.092989] sd 1:0:2:0: [sdd] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.092990] sd 1:0:2:0: [sdd] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.092991] blk_update_request: I/O error, dev sdd, sector 3645713904
Oct  6 07:42:11 omv kernel: [21313.093036] sd 1:0:7:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.093037] sd 1:0:7:0: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.093038] blk_update_request: I/O error, dev sdi, sector 3645713856
Oct  6 07:42:11 omv zed: eid=11 class=io pool=mediatank
Oct  6 07:42:11 omv zed: eid=12 class=io pool=mediatank
Oct  6 07:42:29 omv kernel: [21330.548739] sd 1:0:5:0: [sdg] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:29 omv kernel: [21330.548743] sd 1:0:5:0: [sdg] tag#3 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:29 omv kernel: [21330.548745] blk_update_request: I/O error, dev sdg, sector 3645713904
Oct  6 07:42:29 omv kernel: [21330.548788] sd 1:0:6:0: [sdh] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:29 omv kernel: [21330.548790] sd 1:0:6:0: [sdh] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:29 omv kernel: [21330.548791] blk_update_request: I/O error, dev sdh, sector 3645713904
Oct  6 07:42:29 omv zed: eid=13 class=io pool=mediatank
Oct  6 07:42:38 omv kernel: [21339.463139] sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:38 omv kernel: [21339.463143] sd 1:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d e8 00 00 00 08 00 00
Oct  6 07:42:38 omv kernel: [21339.463145] blk_update_request: I/O error, dev sdb, sector 3645713896
Oct  6 07:42:55 omv kernel: [21356.397858] sd 1:0:3:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:55 omv kernel: [21356.397861] sd 1:0:3:0: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:55 omv kernel: [21356.397863] blk_update_request: I/O error, dev sde, sector 3645713856
Oct  6 07:42:55 omv kernel: [21356.397905] sd 1:0:4:0: [sdf] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:55 omv kernel: [21356.397907] sd 1:0:4:0: [sdf] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:55 omv kernel: [21356.397908] blk_update_request: I/O error, dev sdf, sector 3645713856
Oct  6 07:42:55 omv zed: eid=14 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=15 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=16 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=17 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=18 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=19 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=20 class=io pool=mediatank

My hardware specs:

Mainboard:Supermicro X11SSH-CTF
CPU: Intel Xeon E3-1240Lv5 4x 2.10GHz So.1151 TRAY
HBA: LSI SAS3008 onboard
HDDs: 8x4TB WD Red Raid-Z2

My HBA is pci passed through to the kvm. The mpt3sas modules are blacklisted on the host system.

Host-OS - "Proxmox":

root@proxmox:~# uname -a
Linux proxmox 4.4.19-1-pve #1 SMP Wed Sep 14 14:33:50 CEST 2016 x86_64 GNU/Linux

root@proxmox:~# cat /etc/debian_version
8.6

root@proxmox:~# pveversion -v
proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
pve-manager: 4.3-1 (running version: 4.3-1/e7cdc165)
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-88
pve-firmware: 1.1-9
libpve-common-perl: 4.0-73
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-61
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-6
pve-container: 1.0-75
pve-firewall: 2.0-29
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8

Guest-OS (KVM) - Openmediavault3.0.41:

root@omv:~# uname -a
Linux omv 4.4.19-1-pve #1 SMP Wed Sep 14 14:33:50 CEST 2016 x86_64 GNU/Linux

root@omv:~# cat /etc/debian_version
8.6

As you can see, I also use the proxmox kernel in the kvm.

I use the following zfs packages which have a dependency to the openmediavault-plugin "openmediavault-zfs":

root@omv:~# apt-cache depends --important openmediavault-zfs
openmediavault-zfs
  Hängt ab von: openmediavault
  Hängt ab von: debian-zfs
  Hängt ab von: build-essential
root@omv:~# apt-cache depends --important policy openmediavault-zfs
openmediavault-zfs
  Hängt ab von: openmediavault
  Hängt ab von: debian-zfs
  Hängt ab von: build-essential
root@omv:~# apt-cache depends --important debian-zfs
debian-zfs
  Hängt ab von: spl
  Hängt ab von: spl-dkms
    spl-modules-3.16.0-4-amd64
  Hängt ab von: zfs-dkms
  Hängt ab von: zfsutils
    zfsutils-linux
root@omv:~# apt-cache policy debian-zfs
debian-zfs:
  Installiert:           7~jessie
  Installationskandidat: 7~jessie
  Versionstabelle:
 *** 7~jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfs-dkms
zfs-dkms:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
     0.6.5.8-1~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfsutils
zfsutils:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfs-dkms
zfs-dkms:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
     0.6.5.8-1~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy spl
spl:
  Installiert:           0.6.5.7-5-jessie
  Installationskandidat: 0.6.5.7-5-jessie
  Versionstabelle:
     0.6.5.8-2~bpo8+2 0
        100 http://httpredir.debian.org/debian/ jessie-backports/main amd64 Packages
 *** 0.6.5.7-5-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy spl-dkms
spl-dkms:
  Installiert:           0.6.5.7-5-jessie
  Installationskandidat: 0.6.5.7-5-jessie
  Versionstabelle:
     0.6.5.8-2~bpo8+2 0
        100 http://httpredir.debian.org/debian/ jessie-backports/main amd64 Packages
 *** 0.6.5.7-5-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~#

If you need other informations, please tell me what you need.

Thanks and greetings Hoppel

hoppel118 commented Oct 6, 2016

Hey guys,

I also see this error for my pool. I also only see this error in combination with my zfs hdds in the syslog. It always happens, when my hdds have to wake up after a spindown (127). There are no errors in the hdd smart informations.

Oct  6 07:42:11 omv kernel: [21313.092935] sd 1:0:1:0: [sdc] tag#2 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.092939] sd 1:0:1:0: [sdc] tag#2 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.092941] blk_update_request: I/O error, dev sdc, sector 3645713904
Oct  6 07:42:11 omv kernel: [21313.092989] sd 1:0:2:0: [sdd] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.092990] sd 1:0:2:0: [sdd] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.092991] blk_update_request: I/O error, dev sdd, sector 3645713904
Oct  6 07:42:11 omv kernel: [21313.093036] sd 1:0:7:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:11 omv kernel: [21313.093037] sd 1:0:7:0: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:11 omv kernel: [21313.093038] blk_update_request: I/O error, dev sdi, sector 3645713856
Oct  6 07:42:11 omv zed: eid=11 class=io pool=mediatank
Oct  6 07:42:11 omv zed: eid=12 class=io pool=mediatank
Oct  6 07:42:29 omv kernel: [21330.548739] sd 1:0:5:0: [sdg] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:29 omv kernel: [21330.548743] sd 1:0:5:0: [sdg] tag#3 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:29 omv kernel: [21330.548745] blk_update_request: I/O error, dev sdg, sector 3645713904
Oct  6 07:42:29 omv kernel: [21330.548788] sd 1:0:6:0: [sdh] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:29 omv kernel: [21330.548790] sd 1:0:6:0: [sdh] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d f0 00 00 00 08 00 00
Oct  6 07:42:29 omv kernel: [21330.548791] blk_update_request: I/O error, dev sdh, sector 3645713904
Oct  6 07:42:29 omv zed: eid=13 class=io pool=mediatank
Oct  6 07:42:38 omv kernel: [21339.463139] sd 1:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:38 omv kernel: [21339.463143] sd 1:0:0:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d e8 00 00 00 08 00 00
Oct  6 07:42:38 omv kernel: [21339.463145] blk_update_request: I/O error, dev sdb, sector 3645713896
Oct  6 07:42:55 omv kernel: [21356.397858] sd 1:0:3:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:55 omv kernel: [21356.397861] sd 1:0:3:0: [sde] tag#0 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:55 omv kernel: [21356.397863] blk_update_request: I/O error, dev sde, sector 3645713856
Oct  6 07:42:55 omv kernel: [21356.397905] sd 1:0:4:0: [sdf] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct  6 07:42:55 omv kernel: [21356.397907] sd 1:0:4:0: [sdf] tag#1 CDB: Read(16) 88 00 00 00 00 00 d9 4d 2d c0 00 00 00 08 00 00
Oct  6 07:42:55 omv kernel: [21356.397908] blk_update_request: I/O error, dev sdf, sector 3645713856
Oct  6 07:42:55 omv zed: eid=14 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=15 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=16 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=17 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=18 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=19 class=io pool=mediatank
Oct  6 07:42:55 omv zed: eid=20 class=io pool=mediatank

My hardware specs:

Mainboard:Supermicro X11SSH-CTF
CPU: Intel Xeon E3-1240Lv5 4x 2.10GHz So.1151 TRAY
HBA: LSI SAS3008 onboard
HDDs: 8x4TB WD Red Raid-Z2

My HBA is pci passed through to the kvm. The mpt3sas modules are blacklisted on the host system.

Host-OS - "Proxmox":

root@proxmox:~# uname -a
Linux proxmox 4.4.19-1-pve #1 SMP Wed Sep 14 14:33:50 CEST 2016 x86_64 GNU/Linux

root@proxmox:~# cat /etc/debian_version
8.6

root@proxmox:~# pveversion -v
proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
pve-manager: 4.3-1 (running version: 4.3-1/e7cdc165)
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-88
pve-firmware: 1.1-9
libpve-common-perl: 4.0-73
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-61
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-6
pve-container: 1.0-75
pve-firewall: 2.0-29
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8

Guest-OS (KVM) - Openmediavault3.0.41:

root@omv:~# uname -a
Linux omv 4.4.19-1-pve #1 SMP Wed Sep 14 14:33:50 CEST 2016 x86_64 GNU/Linux

root@omv:~# cat /etc/debian_version
8.6

As you can see, I also use the proxmox kernel in the kvm.

I use the following zfs packages which have a dependency to the openmediavault-plugin "openmediavault-zfs":

root@omv:~# apt-cache depends --important openmediavault-zfs
openmediavault-zfs
  Hängt ab von: openmediavault
  Hängt ab von: debian-zfs
  Hängt ab von: build-essential
root@omv:~# apt-cache depends --important policy openmediavault-zfs
openmediavault-zfs
  Hängt ab von: openmediavault
  Hängt ab von: debian-zfs
  Hängt ab von: build-essential
root@omv:~# apt-cache depends --important debian-zfs
debian-zfs
  Hängt ab von: spl
  Hängt ab von: spl-dkms
    spl-modules-3.16.0-4-amd64
  Hängt ab von: zfs-dkms
  Hängt ab von: zfsutils
    zfsutils-linux
root@omv:~# apt-cache policy debian-zfs
debian-zfs:
  Installiert:           7~jessie
  Installationskandidat: 7~jessie
  Versionstabelle:
 *** 7~jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfs-dkms
zfs-dkms:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
     0.6.5.8-1~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfsutils
zfsutils:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy zfs-dkms
zfs-dkms:
  Installiert:           0.6.5.7-8-jessie
  Installationskandidat: 0.6.5.7-8-jessie
  Versionstabelle:
     0.6.5.8-1~bpo8+1 0
        100 http://httpredir.debian.org/debian/ jessie-backports/contrib amd64 Packages
 *** 0.6.5.7-8-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy spl
spl:
  Installiert:           0.6.5.7-5-jessie
  Installationskandidat: 0.6.5.7-5-jessie
  Versionstabelle:
     0.6.5.8-2~bpo8+2 0
        100 http://httpredir.debian.org/debian/ jessie-backports/main amd64 Packages
 *** 0.6.5.7-5-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~# apt-cache policy spl-dkms
spl-dkms:
  Installiert:           0.6.5.7-5-jessie
  Installationskandidat: 0.6.5.7-5-jessie
  Versionstabelle:
     0.6.5.8-2~bpo8+2 0
        100 http://httpredir.debian.org/debian/ jessie-backports/main amd64 Packages
 *** 0.6.5.7-5-jessie 0
        500 http://archive.zfsonlinux.org/debian/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status
root@omv:~#

If you need other informations, please tell me what you need.

Thanks and greetings Hoppel

@hoppel118

This comment has been minimized.

Show comment
Hide comment
@hoppel118

hoppel118 Oct 7, 2016

After disabling spindown and rebooting the kvm I don't see this messages anymore. But I want to spindown my hdds.

hoppel118 commented Oct 7, 2016

After disabling spindown and rebooting the kvm I don't see this messages anymore. But I want to spindown my hdds.

@hoppel118

This comment has been minimized.

Show comment
Hide comment
@hoppel118

hoppel118 Oct 10, 2016

OK, I tried another thing. I use 8x4TB WD Red hdds behind my lsi sas3008 controller. I read that there is a tool to deactivate the automatic spindown in the hdds firmware.

So I downloaded the "Idle3-tools" to my openmediavault (debian jessie) kvm.

The default value for my disks was:

root@omv:~# idle3ctl -g /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)

So I decided to deactivate the default spindown with the following command for all 8 disks:

root@omv:~# idle3ctl -d /dev/sd[b-i]
Idle3 timer is disabled
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!

I power cycled the server completely, started again and had a look at the result with the following command:

root@omv:~# idle3ctl -g105 /dev/sd[b-i]
Idle3 timer is disabled

At this stage of configuration I don't have any issues/errors in the syslog while opening a samba-share with a zfs file system as a basis.

After that I configured my "/etc/hdparm.conf" with the openmediavault webui the following way:

dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2XXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7HXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5EXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E3NXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXX {
    spindown_time = 240
    write_cache = off
}

This way openmediavault has the privileges to spindown the disk after 20 minutes.

Now I see the following on the command line:

root@omv:~# hdparm -C /dev/sd[bcdefghi]

/dev/sdb:
 drive state is:  active/idle

/dev/sdc:
 drive state is:  active/idle

/dev/sdd:
 drive state is:  active/idle

/dev/sde:
 drive state is:  active/idle

/dev/sdf:
 drive state is:  active/idle

/dev/sdg:
 drive state is:  active/idle

/dev/sdh:
 drive state is:  active/idle

/dev/sdi:
 drive state is:  active/idle

20 minutes later I see the following on the command line:

root@omv:~# hdparm -C /dev/sd[bcdefghi]

/dev/sdb:
 drive state is:  standby

/dev/sdc:
 drive state is:  standby

/dev/sdd:
 drive state is:  standby

/dev/sde:
 drive state is:  standby

/dev/sdf:
 drive state is:  standby

/dev/sdg:
 drive state is:  standby

/dev/sdh:
 drive state is:  standby

/dev/sdi:
 drive state is:  standby

So we see that the spindown controlled by openmediavault works fine. Now I opened a file from one of my samba-shares with zfs as a file system. I can see the disks spinning up with hdparm and I see the following messages in the logfile again:

complete syslog: http://pastebin.com/9A300u3R

Oct 10 17:27:01 omv kernel: [ 8733.047909] sd 7:0:5:0: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:01 omv kernel: [ 8733.047912] sd 7:0:5:0: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:01 omv kernel: [ 8733.047914] blk_update_request: I/O error, dev sdg, sector 2569756800
Oct 10 17:27:01 omv zed: eid=11 class=io pool=mediatank
Oct 10 17:27:18 omv kernel: [ 8750.314209] sd 7:0:2:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:18 omv kernel: [ 8750.314212] sd 7:0:2:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:18 omv kernel: [ 8750.314214] blk_update_request: I/O error, dev sdd, sector 2569756800
Oct 10 17:27:18 omv kernel: [ 8750.314259] sd 7:0:6:0: [sdh] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:18 omv kernel: [ 8750.314260] sd 7:0:6:0: [sdh] tag#1 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:18 omv kernel: [ 8750.314261] blk_update_request: I/O error, dev sdh, sector 2569756800
Oct 10 17:27:18 omv zed: eid=12 class=io pool=mediatank
Oct 10 17:27:18 omv zed: eid=13 class=io pool=mediatank
Oct 10 17:27:18 omv zed: eid=14 class=io pool=mediatank
Oct 10 17:27:35 omv kernel: [ 8767.469326] sd 7:0:4:0: [sdf] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:35 omv kernel: [ 8767.469330] sd 7:0:4:0: [sdf] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:35 omv kernel: [ 8767.469332] blk_update_request: I/O error, dev sdf, sector 2550654984
Oct 10 17:27:35 omv kernel: [ 8767.469378] sd 7:0:7:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:35 omv kernel: [ 8767.469379] sd 7:0:7:0: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:35 omv kernel: [ 8767.469380] blk_update_request: I/O error, dev sdi, sector 2550654984
Oct 10 17:27:35 omv zed: eid=15 class=io pool=mediatank
Oct 10 17:27:36 omv zed: eid=16 class=io pool=mediatank
Oct 10 17:27:52 omv kernel: [ 8784.531993] sd 7:0:1:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:52 omv kernel: [ 8784.531997] sd 7:0:1:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:52 omv kernel: [ 8784.531999] blk_update_request: I/O error, dev sdc, sector 2550654984
Oct 10 17:27:52 omv kernel: [ 8784.532040] sd 7:0:3:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:52 omv kernel: [ 8784.532041] sd 7:0:3:0: [sde] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:52 omv kernel: [ 8784.532042] blk_update_request: I/O error, dev sde, sector 2550654984
Oct 10 17:27:53 omv zed: eid=17 class=io pool=mediatank
Oct 10 17:27:53 omv zed: eid=18 class=io pool=mediatank
Oct 10 17:27:53 omv zed: eid=19 class=io pool=mediatank
Oct 10 17:28:02 omv kernel: [ 8793.895994] sd 7:0:0:0: [sdb] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:28:02 omv kernel: [ 8793.895998] sd 7:0:0:0: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 02 94 48 00 00 00 18 00 00
Oct 10 17:28:02 omv kernel: [ 8793.896000] blk_update_request: I/O error, dev sdb, sector 2550305864
Oct 10 17:28:02 omv zed: eid=20 class=io pool=mediatank

So that didn't help at all and i brought it back to default values:

root@omv:~# idle3ctl -s 138 /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!
root@omv:~# idle3ctl -g /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)
root@omv:~# nano /etc/hdparm.conf
quiet
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7HXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5EXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E3NXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXXX {
    write_cache = off
}

What do you think about this?

A last check can be to clone the kvm to bare metal and to check the whole thing again. Maybe it has something to do with kvm or with pci passthrough. But for this I need some time.

Greetings Hoppel

hoppel118 commented Oct 10, 2016

OK, I tried another thing. I use 8x4TB WD Red hdds behind my lsi sas3008 controller. I read that there is a tool to deactivate the automatic spindown in the hdds firmware.

So I downloaded the "Idle3-tools" to my openmediavault (debian jessie) kvm.

The default value for my disks was:

root@omv:~# idle3ctl -g /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)

So I decided to deactivate the default spindown with the following command for all 8 disks:

root@omv:~# idle3ctl -d /dev/sd[b-i]
Idle3 timer is disabled
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!

I power cycled the server completely, started again and had a look at the result with the following command:

root@omv:~# idle3ctl -g105 /dev/sd[b-i]
Idle3 timer is disabled

At this stage of configuration I don't have any issues/errors in the syslog while opening a samba-share with a zfs file system as a basis.

After that I configured my "/etc/hdparm.conf" with the openmediavault webui the following way:

dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2XXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7HXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5EXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E3NXXXXX {
    spindown_time = 240
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXX {
    spindown_time = 240
    write_cache = off
}

This way openmediavault has the privileges to spindown the disk after 20 minutes.

Now I see the following on the command line:

root@omv:~# hdparm -C /dev/sd[bcdefghi]

/dev/sdb:
 drive state is:  active/idle

/dev/sdc:
 drive state is:  active/idle

/dev/sdd:
 drive state is:  active/idle

/dev/sde:
 drive state is:  active/idle

/dev/sdf:
 drive state is:  active/idle

/dev/sdg:
 drive state is:  active/idle

/dev/sdh:
 drive state is:  active/idle

/dev/sdi:
 drive state is:  active/idle

20 minutes later I see the following on the command line:

root@omv:~# hdparm -C /dev/sd[bcdefghi]

/dev/sdb:
 drive state is:  standby

/dev/sdc:
 drive state is:  standby

/dev/sdd:
 drive state is:  standby

/dev/sde:
 drive state is:  standby

/dev/sdf:
 drive state is:  standby

/dev/sdg:
 drive state is:  standby

/dev/sdh:
 drive state is:  standby

/dev/sdi:
 drive state is:  standby

So we see that the spindown controlled by openmediavault works fine. Now I opened a file from one of my samba-shares with zfs as a file system. I can see the disks spinning up with hdparm and I see the following messages in the logfile again:

complete syslog: http://pastebin.com/9A300u3R

Oct 10 17:27:01 omv kernel: [ 8733.047909] sd 7:0:5:0: [sdg] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:01 omv kernel: [ 8733.047912] sd 7:0:5:0: [sdg] tag#0 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:01 omv kernel: [ 8733.047914] blk_update_request: I/O error, dev sdg, sector 2569756800
Oct 10 17:27:01 omv zed: eid=11 class=io pool=mediatank
Oct 10 17:27:18 omv kernel: [ 8750.314209] sd 7:0:2:0: [sdd] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:18 omv kernel: [ 8750.314212] sd 7:0:2:0: [sdd] tag#0 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:18 omv kernel: [ 8750.314214] blk_update_request: I/O error, dev sdd, sector 2569756800
Oct 10 17:27:18 omv kernel: [ 8750.314259] sd 7:0:6:0: [sdh] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:18 omv kernel: [ 8750.314260] sd 7:0:6:0: [sdh] tag#1 CDB: Read(16) 88 00 00 00 00 00 99 2b 60 80 00 00 00 08 00 00
Oct 10 17:27:18 omv kernel: [ 8750.314261] blk_update_request: I/O error, dev sdh, sector 2569756800
Oct 10 17:27:18 omv zed: eid=12 class=io pool=mediatank
Oct 10 17:27:18 omv zed: eid=13 class=io pool=mediatank
Oct 10 17:27:18 omv zed: eid=14 class=io pool=mediatank
Oct 10 17:27:35 omv kernel: [ 8767.469326] sd 7:0:4:0: [sdf] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:35 omv kernel: [ 8767.469330] sd 7:0:4:0: [sdf] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:35 omv kernel: [ 8767.469332] blk_update_request: I/O error, dev sdf, sector 2550654984
Oct 10 17:27:35 omv kernel: [ 8767.469378] sd 7:0:7:0: [sdi] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:35 omv kernel: [ 8767.469379] sd 7:0:7:0: [sdi] tag#0 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:35 omv kernel: [ 8767.469380] blk_update_request: I/O error, dev sdi, sector 2550654984
Oct 10 17:27:35 omv zed: eid=15 class=io pool=mediatank
Oct 10 17:27:36 omv zed: eid=16 class=io pool=mediatank
Oct 10 17:27:52 omv kernel: [ 8784.531993] sd 7:0:1:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:52 omv kernel: [ 8784.531997] sd 7:0:1:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:52 omv kernel: [ 8784.531999] blk_update_request: I/O error, dev sdc, sector 2550654984
Oct 10 17:27:52 omv kernel: [ 8784.532040] sd 7:0:3:0: [sde] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:27:52 omv kernel: [ 8784.532041] sd 7:0:3:0: [sde] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 07 e8 08 00 00 00 08 00 00
Oct 10 17:27:52 omv kernel: [ 8784.532042] blk_update_request: I/O error, dev sde, sector 2550654984
Oct 10 17:27:53 omv zed: eid=17 class=io pool=mediatank
Oct 10 17:27:53 omv zed: eid=18 class=io pool=mediatank
Oct 10 17:27:53 omv zed: eid=19 class=io pool=mediatank
Oct 10 17:28:02 omv kernel: [ 8793.895994] sd 7:0:0:0: [sdb] tag#1 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
Oct 10 17:28:02 omv kernel: [ 8793.895998] sd 7:0:0:0: [sdb] tag#1 CDB: Read(16) 88 00 00 00 00 00 98 02 94 48 00 00 00 18 00 00
Oct 10 17:28:02 omv kernel: [ 8793.896000] blk_update_request: I/O error, dev sdb, sector 2550305864
Oct 10 17:28:02 omv zed: eid=20 class=io pool=mediatank

So that didn't help at all and i brought it back to default values:

root@omv:~# idle3ctl -s 138 /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)
Please power cycle your drive off and on for the new setting to be taken into account. A reboot will not be enough!
root@omv:~# idle3ctl -g /dev/sd[b-i]
Idle3 timer set to 138 (0x8a)
root@omv:~# nano /etc/hdparm.conf
quiet
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E2XXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6LXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7HXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5EXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E3NXXXXX {
    write_cache = off
}
/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E7XXXXXX {
    write_cache = off
}

What do you think about this?

A last check can be to clone the kvm to bare metal and to check the whole thing again. Maybe it has something to do with kvm or with pci passthrough. But for this I need some time.

Greetings Hoppel

@hoppel118

This comment has been minimized.

Show comment
Hide comment

These issues describe the same thing:

#4713
#3785
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1504909?comments=all

Greetings Hoppel

@luxflow

This comment has been minimized.

Show comment
Hide comment
@luxflow

luxflow Jan 2, 2017

I also encountered this issues.
I did several tests
intel sata controller + SATA hdd + zfs + writing during spin up = no issue
sas2008 controller + SATA hdd + ext4 + writing during spin up = no issue
sas2008 controller + SATA hdd + zfs + writing during spin up = i/o issue
maybe SATA disk behind SAS controller is problem

luxflow commented Jan 2, 2017

I also encountered this issues.
I did several tests
intel sata controller + SATA hdd + zfs + writing during spin up = no issue
sas2008 controller + SATA hdd + ext4 + writing during spin up = no issue
sas2008 controller + SATA hdd + zfs + writing during spin up = i/o issue
maybe SATA disk behind SAS controller is problem

@hoppel118

This comment has been minimized.

Show comment
Hide comment
@hoppel118

hoppel118 Jan 2, 2017

There might be problem between zfs and "mpt3sas" driver.

Is it possible for you to reduce your spin up time in your controller bios? Maybe it's possible for you to stagger spin up two or three disks at a time. This should be possible if your controller is flashed to "it mode" and if your psu is powerful enough.

For me it's not possible to check this at the moment, because I use a beta firmware from supermicro, where the option to spin up some disks at a time is not available.

How many disks do you use behind your sas2008 controller for zfs? How long do you have to wait until all disks got up?

Greetings Hoppel

hoppel118 commented Jan 2, 2017

There might be problem between zfs and "mpt3sas" driver.

Is it possible for you to reduce your spin up time in your controller bios? Maybe it's possible for you to stagger spin up two or three disks at a time. This should be possible if your controller is flashed to "it mode" and if your psu is powerful enough.

For me it's not possible to check this at the moment, because I use a beta firmware from supermicro, where the option to spin up some disks at a time is not available.

How many disks do you use behind your sas2008 controller for zfs? How long do you have to wait until all disks got up?

Greetings Hoppel

@luxflow

This comment has been minimized.

Show comment
Hide comment
@luxflow

luxflow Jan 3, 2017

can't test bios, server should be rebooted,
4 disks, I don't know how long it is exactly, but spin up serially

luxflow commented Jan 3, 2017

can't test bios, server should be rebooted,
4 disks, I don't know how long it is exactly, but spin up serially

@red-scorp

This comment has been minimized.

Show comment
Hide comment
@red-scorp

red-scorp Mar 27, 2018

Same problem on Z87 Extreme11/ac -> 22 x SATA3 (16 x SAS3 12.0 Gb/s + 6 x SATA3 6.0 Gb/s) from LSI SAS 3008 Controller+ 3X24R Expander

OS: Ubuntu 18.04 dev

$ cat /etc/issue
Ubuntu Bionic Beaver (development branch) \n \l
$ uname -a
Linux AGVault 4.15.0-12-generic #13-Ubuntu SMP Thu Mar 8 06:24:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -l zfs*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
un  zfs            <none>       <none>       (no description available)
un  zfs-dkms       <none>       <none>       (no description available)
un  zfs-dracut     <none>       <none>       (no description available)
un  zfs-fuse       <none>       <none>       (no description available)
un  zfs-initramfs  <none>       <none>       (no description available)
un  zfs-modules    <none>       <none>       (no description available)
ii  zfs-zed        0.7.5-1ubunt amd64        OpenZFS Event Daemon
un  zfsutils       <none>       <none>       (no description available)
ii  zfsutils-linux 0.7.5-1ubunt amd64        command-line tools to manage Open

ZFS hangs on spin-up of SATA HDDs. So I assume It's a problem between LSI controller driver and ZFS.
mpt3sas 17.100.00.00

I'll try BIOS updates, let's see if it fix the problem

UPDATE: I've updated MB BIOS and flash SAS Contoller to IT mode with a newest available FW from 9300 card. This did not help with the disk spin-up problem. funny enough it's not only the ZFS freezes but hddtemp and smartctl too. This issue might be related not to ZFS but to misbehavior of mpt3sas itself.

Please let me know if you found any solution or workarounds to the freezing disk on spin-up?
Thanks in advance!

red-scorp commented Mar 27, 2018

Same problem on Z87 Extreme11/ac -> 22 x SATA3 (16 x SAS3 12.0 Gb/s + 6 x SATA3 6.0 Gb/s) from LSI SAS 3008 Controller+ 3X24R Expander

OS: Ubuntu 18.04 dev

$ cat /etc/issue
Ubuntu Bionic Beaver (development branch) \n \l
$ uname -a
Linux AGVault 4.15.0-12-generic #13-Ubuntu SMP Thu Mar 8 06:24:47 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -l zfs*
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
un  zfs            <none>       <none>       (no description available)
un  zfs-dkms       <none>       <none>       (no description available)
un  zfs-dracut     <none>       <none>       (no description available)
un  zfs-fuse       <none>       <none>       (no description available)
un  zfs-initramfs  <none>       <none>       (no description available)
un  zfs-modules    <none>       <none>       (no description available)
ii  zfs-zed        0.7.5-1ubunt amd64        OpenZFS Event Daemon
un  zfsutils       <none>       <none>       (no description available)
ii  zfsutils-linux 0.7.5-1ubunt amd64        command-line tools to manage Open

ZFS hangs on spin-up of SATA HDDs. So I assume It's a problem between LSI controller driver and ZFS.
mpt3sas 17.100.00.00

I'll try BIOS updates, let's see if it fix the problem

UPDATE: I've updated MB BIOS and flash SAS Contoller to IT mode with a newest available FW from 9300 card. This did not help with the disk spin-up problem. funny enough it's not only the ZFS freezes but hddtemp and smartctl too. This issue might be related not to ZFS but to misbehavior of mpt3sas itself.

Please let me know if you found any solution or workarounds to the freezing disk on spin-up?
Thanks in advance!

@d-helios

This comment has been minimized.

Show comment
Hide comment
@d-helios

d-helios Jul 11, 2018

I have the same issue with SAS drives.

my configuration:

HBA: lsi sas 9300-8e (Symbios Logic SAS3008)
Drives: HUC101818CS4204, PX05SMB040

kernel parameters:

BOOT_IMAGE=/vmlinuz-4.15.0-23-generic root=UUID=4f30713c-5618-4c31-a051-97a9e5acee09 ro console=tty1 console=ttyS0,115200 dm_mod.use_blk_mq=y scsi_mod.use_blk_mq=y transparent_hugepage=never processor.max_cstate=1 udev.children-max=32 mpt3sas.msix_disable=1

Notice:
I have the same configuration on solaris and it's work fine. the only thing that I changed is power-condition:false statement in sd.conf:

sd-config-list=
"HGST    HUH", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUS72", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUC10", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUC15", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUSMH", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096",
"HGST    HUSMM", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096",
"TOSHIBA PX", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096";

I have the same issue with SAS drives.

my configuration:

HBA: lsi sas 9300-8e (Symbios Logic SAS3008)
Drives: HUC101818CS4204, PX05SMB040

kernel parameters:

BOOT_IMAGE=/vmlinuz-4.15.0-23-generic root=UUID=4f30713c-5618-4c31-a051-97a9e5acee09 ro console=tty1 console=ttyS0,115200 dm_mod.use_blk_mq=y scsi_mod.use_blk_mq=y transparent_hugepage=never processor.max_cstate=1 udev.children-max=32 mpt3sas.msix_disable=1

Notice:
I have the same configuration on solaris and it's work fine. the only thing that I changed is power-condition:false statement in sd.conf:

sd-config-list=
"HGST    HUH", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUS72", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUC10", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUC15", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,physical-block-size:4096",
"HGST    HUSMH", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096",
"HGST    HUSMM", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096",
"TOSHIBA PX", "retries-timeout:1,retries-busy:1,retries-reset:1,retries-victim:2,throttle-max:32,disksort:false,cache-nonvolatile:true,power-condition:false,physical-block-size:4096";
@cwedgwood

This comment has been minimized.

Show comment
Hide comment
@cwedgwood

cwedgwood Jul 11, 2018

Contributor

@d-helios one ugly hack to paper over the issue is to tweak zfs-import-cache.service unit file (in the [Service] section) with something like:

# quirk/hack to make sure all the disks are visible
ExecStartPre=/sbin/modprobe mpt3sas
ExecStartPre=/bin/sleep 13
ExecStartPre=/sbin/udevadm settle

(tweak as appropriate for you).

You probably do not need the modprobe as the module will normally be loaded by this time, but for testing (systemctl stop/start w/ rmmod) it is needed.

Contributor

cwedgwood commented Jul 11, 2018

@d-helios one ugly hack to paper over the issue is to tweak zfs-import-cache.service unit file (in the [Service] section) with something like:

# quirk/hack to make sure all the disks are visible
ExecStartPre=/sbin/modprobe mpt3sas
ExecStartPre=/bin/sleep 13
ExecStartPre=/sbin/udevadm settle

(tweak as appropriate for you).

You probably do not need the modprobe as the module will normally be loaded by this time, but for testing (systemctl stop/start w/ rmmod) it is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment