Current ZFS git tip constantly creates and destroys taskq kernel threads, especially z_wr_int_N threads #7274

siebenmann · 2018-03-06T16:46:44Z

System information

Type	Version/Name
Distribution Name	Fedora
Distribution Version	Fedora 27
Linux Kernel	4.15.6-300.fc27.x86_64
Architecture	x86_64
ZFS Version	0.7.0-352_g2705ebf0a (current git tip)
SPL Version	git tip `3673d03`

Describe the problem you're observing

On my 16-core machine, ZFS and/or SPL appears to be constantly creating and destroying kernel threads for taskqs, especially 'z_wr_int_N' threads. I see huge PIDs for these shortly after system boot, for example:

; uptime
 11:26:47 up 43 min,  1 user,  load average: 0.11, 0.36, 1.04
; ps auxgww | fgrep z_wr_int
[...]
root      321029  0.0  0.0      0     0 ?        S<   11:26   0:00 [z_wr_int_6]

This is on a system with generally low disk IO (and certainly low disk IO since boot). The churn is sustained and rapid; I have had Linux's extended PID space wrap around in a day or so on this system.

I have three pools on this system. Two of the three have a single mirrored vdev; the third is a single disk (so it's non-redundant; I use ZFS to notice checksum errors). I have another four-core system with a single ZFS pool with a single mirrored vdev that also appears to be experiencing this, but at a much slower rate; presumably the rate of churn is related to how many cores the system has.

This churn appears to have been going on for some time based on my system logs, but in the past it has been less obvious because my system was restricting itself to 16-bit PIDs and there was not the glaring clue of PIDs with six or more digits.

The text was updated successfully, but these errors were encountered:

behlendorf · 2018-03-07T17:40:39Z

@siebenmann is the pid churn somehow causing problems for the system? This is normal behavior for ZFS which creates and destroys the I/O pipeline kernel threads as they are needed. If you'd prefer the threads to be long lived set the spl_taskq_thread_dynamic module option to 0.

siebenmann · 2018-03-07T21:57:32Z

It's possible that PID churn or the very large PID numbers it now creates on my machine are causing problems, but I'm going to have to investigate more. I've tried booting with spl_taskq_thread_dynamic set to 0 and indeed the direct ZFS threads wind up staying with low PIDs. However, less than two hours after boot (with some compiles thrown in) I'm up to over PID 450,000, and apparently something is churning kworker threads; a number of them have very high PIDs. Is this likely to be something involving ZFS, or should I be looking elsewhere?

behlendorf · 2018-03-07T22:42:57Z

@siebenmann it may still be caused by ZFS. There are several places in the code where a pool of kthread workers will by dynamically created then destroyed so some work can be performed in parallel.

siebenmann · 2018-03-09T20:43:03Z

It turns out that the PID churn I was seeing is not ZFS's fault. Much to my surprise, building Go from source and running its tests goes through over 200,000 PIDs (even on ext4), although the whole process completes in only a few minutes. This has likely been going on for some time, but was previously masked because my machine used the normal Linux low /proc/sys/kernel/pix_max setting of 32k, which rolled over repeatedly during the build process and thus didn't create clear evidence of this happening. The SPL setting has only a small effect (if any) on the number of PIDs churned through during the build and test process.

(I don't know why Go is churning through so many PIDs here, but it's definitely not ZFS's problem.)

behlendorf · 2018-03-09T21:42:14Z

@siebenmann ahh that explains it. Then please go ahead and close this out if you agree there isn't an actual problem here.

lnicola · 2019-11-21T13:34:10Z

I think I'm also seeing this, not sure since when.

$ uptime -p
up 16 hours, 41 minutes
$ pidof pidof                                              
442569

behlendorf added the Type: Documentation Indicates a requested change to the documentation label Mar 7, 2018

siebenmann closed this as completed Mar 9, 2018

lnicola mentioned this issue Dec 6, 2019

ZFS max write 50MBps on MASTER with dd (bs=default) #9690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current ZFS git tip constantly creates and destroys taskq kernel threads, especially z_wr_int_N threads #7274

Current ZFS git tip constantly creates and destroys taskq kernel threads, especially z_wr_int_N threads #7274

siebenmann commented Mar 6, 2018

behlendorf commented Mar 7, 2018 •

edited

Loading

siebenmann commented Mar 7, 2018

behlendorf commented Mar 7, 2018 •

edited

Loading

siebenmann commented Mar 9, 2018

behlendorf commented Mar 9, 2018

lnicola commented Nov 21, 2019

Current ZFS git tip constantly creates and destroys taskq kernel threads, especially z_wr_int_N threads #7274

Current ZFS git tip constantly creates and destroys taskq kernel threads, especially z_wr_int_N threads #7274

Comments

siebenmann commented Mar 6, 2018

System information

Describe the problem you're observing

behlendorf commented Mar 7, 2018 • edited Loading

siebenmann commented Mar 7, 2018

behlendorf commented Mar 7, 2018 • edited Loading

siebenmann commented Mar 9, 2018

behlendorf commented Mar 9, 2018

lnicola commented Nov 21, 2019

behlendorf commented Mar 7, 2018 •

edited

Loading

behlendorf commented Mar 7, 2018 •

edited

Loading