Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current ZFS git tip constantly creates and destroys taskq kernel threads, especially z_wr_int_N threads #7274

Closed
siebenmann opened this issue Mar 6, 2018 · 6 comments
Labels
Type: Documentation Indicates a requested change to the documentation

Comments

@siebenmann
Copy link
Contributor

System information

Type Version/Name
Distribution Name Fedora
Distribution Version Fedora 27
Linux Kernel 4.15.6-300.fc27.x86_64
Architecture x86_64
ZFS Version 0.7.0-352_g2705ebf0a (current git tip)
SPL Version git tip 3673d03

Describe the problem you're observing

On my 16-core machine, ZFS and/or SPL appears to be constantly creating and destroying kernel threads for taskqs, especially 'z_wr_int_N' threads. I see huge PIDs for these shortly after system boot, for example:

; uptime
 11:26:47 up 43 min,  1 user,  load average: 0.11, 0.36, 1.04
; ps auxgww | fgrep z_wr_int
[...]
root      321029  0.0  0.0      0     0 ?        S<   11:26   0:00 [z_wr_int_6]

This is on a system with generally low disk IO (and certainly low disk IO since boot). The churn is sustained and rapid; I have had Linux's extended PID space wrap around in a day or so on this system.

I have three pools on this system. Two of the three have a single mirrored vdev; the third is a single disk (so it's non-redundant; I use ZFS to notice checksum errors). I have another four-core system with a single ZFS pool with a single mirrored vdev that also appears to be experiencing this, but at a much slower rate; presumably the rate of churn is related to how many cores the system has.

This churn appears to have been going on for some time based on my system logs, but in the past it has been less obvious because my system was restricting itself to 16-bit PIDs and there was not the glaring clue of PIDs with six or more digits.

@behlendorf
Copy link
Contributor

behlendorf commented Mar 7, 2018

@siebenmann is the pid churn somehow causing problems for the system? This is normal behavior for ZFS which creates and destroys the I/O pipeline kernel threads as they are needed. If you'd prefer the threads to be long lived set the spl_taskq_thread_dynamic module option to 0.

@behlendorf behlendorf added the Type: Documentation Indicates a requested change to the documentation label Mar 7, 2018
@siebenmann
Copy link
Contributor Author

It's possible that PID churn or the very large PID numbers it now creates on my machine are causing problems, but I'm going to have to investigate more. I've tried booting with spl_taskq_thread_dynamic set to 0 and indeed the direct ZFS threads wind up staying with low PIDs. However, less than two hours after boot (with some compiles thrown in) I'm up to over PID 450,000, and apparently something is churning kworker threads; a number of them have very high PIDs. Is this likely to be something involving ZFS, or should I be looking elsewhere?

@behlendorf
Copy link
Contributor

behlendorf commented Mar 7, 2018

@siebenmann it may still be caused by ZFS. There are several places in the code where a pool of kthread workers will by dynamically created then destroyed so some work can be performed in parallel.

@siebenmann
Copy link
Contributor Author

It turns out that the PID churn I was seeing is not ZFS's fault. Much to my surprise, building Go from source and running its tests goes through over 200,000 PIDs (even on ext4), although the whole process completes in only a few minutes. This has likely been going on for some time, but was previously masked because my machine used the normal Linux low /proc/sys/kernel/pix_max setting of 32k, which rolled over repeatedly during the build process and thus didn't create clear evidence of this happening. The SPL setting has only a small effect (if any) on the number of PIDs churned through during the build and test process.

(I don't know why Go is churning through so many PIDs here, but it's definitely not ZFS's problem.)

@behlendorf
Copy link
Contributor

@siebenmann ahh that explains it. Then please go ahead and close this out if you agree there isn't an actual problem here.

@lnicola
Copy link
Contributor

lnicola commented Nov 21, 2019

I think I'm also seeing this, not sure since when.

image

$ uptime -p
up 16 hours, 41 minutes
$ pidof pidof                                              
442569

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

No branches or pull requests

3 participants