New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dmesg] ZFS: Unable to set "noop" scheduler #169
Comments
Are you using the 0.6.0-rc2 release? There was a bug like this which was fixes post -rc1. ZFS should try and set the scheduler but it should only be doing it for whole block devices, never for partitions. As for performance and testing/results I'd you have some I'd love to see them. In theory the noop scheduler for ZFS should be best since it schedules it's own IO. Setting noop should just get you front and back merging without any additional overhead. However, this default value was only set on theory and not practice. So if you have some hard number which show which choice is currently best I've love to see them. Also you can have ZFS set the scheduler of your choice when it loads by setting the zfs_vdev_scheduler module option. By default it is noop but you can set it to any valid scheduler, or none if you don't want it changing anything. |
I used current git source. As I understand, it is 0.6.0-rc2 + latest patches. I can redo my performance test later. Now I can just say that I put several copies of zfs sources and several iso images on zfs system and measured "tar c" of them all. |
Last patches seems to introduce bit more instability or may be I stressed zfs less before that. With lot IO it hangs at some point without any notes in dmesg. Deadlock maybe? [root@cherry ~]# cd /pool/ [root@cherry pool]# df -h . Filesystem Size Used Avail Use% Mounted on pool 4.0G 3.2G 849M 79% /pool [root@cherry pool]# find . | wc -l 20089 I have run something like "du -shc" and after it my first test deadlocked too. So I needed to reboot (reset). [root@cherry pool]# cat /sys/block/sdc/queue/scheduler noop anticipatory [deadline] cfq [root@cherry pool]# tar c . | pv >/dev/null tar: .: file changed as we read it <=> ] 3.06GB 0:04:37 [11.3MB/s] [ <=> ] [root@cherry pool]# tar c . | pv >/dev/null 3.06GB 0:04:33 [11.5MB/s] [ <=> ] [root@cherry pool]# Also I want to note "tar: .: file changed as we read it". There was no other IO on this partition. I think it is related to unclean umount. But I thaught zfs can handle this without problems. [root@cherry pool]# echo noop >/sys/block/sdc/queue/scheduler [root@cherry pool]# tar c . | pv >/dev/null tar: .: file changed as we read it <=> ] 3.06GB 0:04:39 [11.2MB/s] [ <=> ] [root@cherry pool]# tar c . | pv >/dev/null 3.06GB 0:04:35 [11.4MB/s] [ <=> ] [root@cherry pool]# Change notice again. Speed is the same, may be a bit slower. Another reboot (with reset). And now test with cfq: [root@cherry pool]# echo cfq >/sys/block/sdc/queue/scheduler [root@cherry pool]# tar c . | pv >/dev/null tar: .: file changed as we read it ] 3.06GB 0:03:15 [16.1MB/s] [ <=> ] [root@cherry pool]# tar c . | pv >/dev/null 3.06GB 0:03:10 [16.5MB/s] [ <=> ] [root@cherry pool]# And this is faster than noop and deadline. (I need to reset after this too) |
There's another problem with CFQ, which is currently also noticable in software raid on some kernels and seems to be a known bug. As soon as you mix a desktop system with a software raid and create high i/o (just try bonnie++ for example) you're deadlocking your desktop. Solution is switching to another scheduler (noop, deadline) or using cgroups, iirc i've seen a kernel option in 2.6.38 which solves this trouble. So, just wanted to point out you/some users might have trouble with CFQ + ZFS as soon as you're using a desktop environment on the same machine. Also i remember that brian said he's/they're currently only working on getting it implemented, performance stuff will be done later. |
I get this at startup: First off, I'm not sure why it tries twice. But second, it's definitely trying to set it for the partition rather than the whole drive. I'm running 0.6.0.5-0ubuntu3~natty1 from the PPA. |
Does you still always see this at start up with the current code? The error here is a little misleading. When you initially configure zfs, if you give it whole devices, it will automatically partition the device and create a GPT partition table. The first partition will be aligned exactly to the 1 MiB boundary to ensure correct 512/4096 sector alignment and to leave some headroom. In addition the whole disk property will be set. This property is then used when opening the device to determine if it should attempt to adjust the elevator for the whole device. If it owns the entire device it attempts to do this and will use the correct device name, not the partition name. See vdev_elevator_switch() for the details. Unfortunately, I've seen spurious EFAULT errors in the past for some reason when attempting to change the elevator so it now retries three times and only emits an error on the 3rd failure. |
I confirm. It does not complains about "unable to set noop" with current code. And it set noop on whole device. |
The wholedisk property gets set as a key/value pairs in a per-vdev configuration nvlist. If your interested in how all of this happens I would start by looking at cmd/zpool/zpool_vdev.c:make_leaf_vdev(). It is here where the passed device is determined if it's a whole disk, partition, or file. The ZPOOL_CONFIG_WHOLE_DISK key is they set accordingly in the nvlist for future consumption. |
(cherry picked from commit 27f17d5) Signed-off-by: kmova <kiran.mova@openebs.io>
…s#110)" (openzfs#169) This reverts commit cbe4186.
…a-release Sync with Upstream zfs-2.2-release
During mount zfs for some reason tries to set noop scheduler on partition, not the whole drive. In this case whole drive was given to zfs and it created partition on it.
Question is why is it trying to set noop on partition? Also I've made couple of simpe tests (moving some files around) and setting different schedulers on whole drive. And during this tests noop was slower than deadline and cfq. So if this is case-dependent, I think it is more preferrable for user to set scheduler by its own.
The text was updated successfully, but these errors were encountered: