-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zfs send hangs #1638
Comments
Someone else had a similar problem. It was determined to that memory corruption from his use of non-ECC RAM corrupted his pool. In his situation, a scrub revealed several metadata objects had been corrupted, although it is possible for the checksum calculation to be done against the corrupt data if the memory was causing corruption. Have you done a scrub and are you using ECC memory? |
-----BEGIN PGP SIGNED MESSAGE----- I'm scrubbing regularly once a week and yes, it's all Xeon E5s w/ ECC On 08/09/2013 01:18 PM, Richard Yao wrote:
iF4EAREIAAYFAlIE6uIACgkQMBKdi9lkZ6qdhAEAuI+J6I1eP+6cshCB4hFFBoIv |
@snajpa @behlendorf debugged the last report like this. I will leave this in his hands. |
And because I can't reboot the machine (it's in production), naturally, our system left one zfs destroy on that snapshot hanging (it's trying to do snapshots, send them to backup storage and then destroy them).
|
bump :) I really need a fix for this, we need badly ZFS in production state and this is a pretty serious blocker. Can I send beers or anything to anyone who'd fix it? Please? :) |
The zfs send process is in the D state which usually means the process is waiting for IO, the kernel will complain about blocking processes after a specific period(120s by default) and print out the stacktrace, are there any such logs in your /var/log/messages? |
nope, there's nothing + gstack hangs, systemtap is useless; I'm out of options how to get the stack trace :( |
This is what I'm getting in the syslog (lots of these, all of them seem to be the same), but I doubt it's much related:
|
Ok, I'll try to get more info next time, which I think will be in a few days from now. I had to reboot the machine, because it completely stopped doing any writes to the zpool. Only I'm afraid that I'm seeing more than one issue at once - that ^ dmesg, zfs send hangs and these complete write lockups... How can I go about debugging this and getting more info for you guys? I've already made sure that hang tasks traces go to syslog, I'lll try zpool events -v next time. What more can I do? :) |
Of course the ideal way is to minimize the workload or test environment which also triggers the same bug so we can reproduce it by ourselves to find the root cause. But since your servers are in production maybe it's not easy to do so. Anyway the output of zpool events -v may help to find if the problem is caused by a faulting device, if so there will be many zios delayed by the same vdev(which I suspect to be the root cause). |
Ok so there it is again, zfs send's hung again:
|
So the oops seem more or less random without revealing the root cause, this time it did not trigger the newly added IO deadman facility(zpool events -v showed no output), @ryao @behlendorf is threre any other way to catch the zio that never finished? I noticed the oops logs are all on server node6, does this bug also happen on other servers(if not perhaps there are some faulting disks on server node6)? |
We don't have ZFS on any other nodes yet, this first is sort of a proof of concept for OpenVZ+ZFS. I don't think there's a hardware issue, mdraid+ext4+linux combo was working pretty well there (for a few weeks, before it got reinstalled w/ zfs). |
For production use I'd like to recommend the current stable version 0.6.1, can you try this version and see if this bug still happens? If so we're facing a long living issue, otherwise it's a regression and can be analyzed with git bisect. |
Well that's where I started and this problem was the reason why I decided to build and install latest git version at that time. |
The system is certainly blocked waiting for something to complete. That could be a slow IO or a whole mess of smaller IOs, there's not quite enough in the stack to say for certain. @snajpa Are there are non-standard zfs module options being used on this system? Have you enable deduplication? Is there anything else different about it? Assuming there isn't we still need to know what it's blocked on. To do that we need the full output of sysrq-t from the hung system. That will log to the console all of the active processes and their stacks which should allow us to see which threads are waiting on each other. |
@behlendorf I'm not passing zfs any params, no dedup, only compression=on, there's a dataset per each OpenVZ container, each of them has refquota set, otherwise nothing too fancy. It is backed up by snap -> send | recv to illumos powered backup server. sysrq-t output doesn't include these processes in D state - btw I have 3 of them for the same dataset today on the machine:
|
@behlendorf ZFS was stuck for about 15 mins before 7am today. There's nothing in the system log which would say anything about it, except of that trace which is filling syslog all the time (below). https://prasiatko.vpsfree.cz/munin/prg.vpsfree.cz/node6.prg.vpsfree.cz/ Example of one disk in the pool of how the IO dropped (and that's the same for all of the disks in the pool): Zpool config:
And I was talking about these traces in the syslog which are filling it:
|
The system was now doing the same thing (actually not doing - no IO), this time I sysrq-t'd all stack traces to the syslog. Let me know if they might be of any help, I'd deliver it to you - it's not so small dump :) Example of process suffering by not being able to do the IO:
|
The stack trace makes sense. It looks like the php process is trying to write to a file and waiting for the range lock to become available. But why the range lock is unusable still keeps unknown. Would you please CC me when compressing and sending the dumped logs? It may help to locate the root cause. |
Ok, there shouldn't be anything sensitive in there, so here is the log: https://vpsfree.cz/download/messages-snajpa-2013-08-25.gz Btw, I've upgraded to 0.6.2 y'day and today while having my morning tea I've discovered two freshly new hung zfs send commands, one of them is hanging on the same dataset as before the reboot, I've got no idea if its anyhow relevant. Good news is that traces in the syslog are finally somewhat more telling, they're about the kernel thread, not the userspace process anymore:
|
@snajpa Unfortunately I have no permission to access that file :( |
Sorry, my bad. Try now, it should work (wrong file permissions). |
ok I got it, does 'zpool events -v' show any output this time? |
|
This stack trace is a little wierd, ARC is trying to allocate a zio data buf from the SPL slab caches but got stuck on the allocation(process in the D state). Normally this should not happen, is your system low in free memory when the hang happens? Aug 25 22:14:01 node6 kernel: [484506.360860] php D ffff881f64eb4c60 0 799788 799702 556 0x00000000 |
You can see for yourself here - about 11pm yesterday: I didn't mention that I'm using swappiness = 100, do you think that may play a role in this? |
Yes the higher the value the more swap in/out IO, but it's also triggered by the low memory situation. According to the collected system stats the system load suddenly went really high(the result of many processes pending in D state) and there was a lot of swap in/out nearly 0:00 this morning. If there are no other applications consuming so much memory I think it's ARC itself did, can you limit the total memory ARC occupies and see if the hang still happens? |
Is it possible to set the ARC limits w/o reloading ZFS module? It is still kind of in production, people are getting mad about all of the downtime and I'd rather... well. I shouldn't have put this in production in the first place... :) |
I suspect the memory trashing is caused by the memory fragmentation in the current SPL slab implementation, which can not easily get condensed without limiting the total memory usage at first. Also it's better to limit ARC's size when there are many kinds of applications on the same server since ARC is really memory hogging. |
Ok, I'm trying to figure out to what levels should I limit ARC size. Currently, it looks like this:
So there's 62G in ARC. Usecase for the machine is OpenVZ containers, so there's no way to predict RAM usage on that server. I'm not sure that if I limit ARC to say 45G, what will happen if the system actually needs memory for applications? If there are such conditions that ARC needs to go down to eg. 20G, isn't the problem still going to be there? B/c limits will be set to 45G, it still creates about the same situation as I'm now in, doesn't it? |
If you limit ARC to 45GB, then it will consume 45GB memory at most(not adding up the potential fragmentation). The rest will be reserved for other applications so they can use it at their own freely. If you limit the size of ARC to a reasonable value w/o causing swap in/out, firstly you are free from the memory trashing situation when the system is busy swapping in/out pages without doing any useful work, secondly you are free from the potential deadlock caused by ARC itself, namely the memory pressure forces ARC to write back the dirty ARC buffers, while writing back themselves needs more memory. |
Ok, thanks. I'll try adding ARC limit. Here's one thing though, we've been talking about at least two different issues here and I'm not sure whether they're related or not. One is this complete write freeze, which is/was happening from time to time, that should be "solved" by limiting ARC size as you say. But the other one is those hanging zfs send processes and that doesn't seem to me like it's got anything to do with ARC size. And these keep piling up:
|
@snajpa The two may be related, if memory allocations end up blocking due to low memory there could be a range of symptoms. @casualfish looks to be on the right trail here, limiting your ARC may help as a medium term workaround. And to answer your previous question the ARC limits can now be dynamically tuned, the module parameters have been made writable. Just echo in the new values and they'll take effect.
|
So I've set zfs_arc_max to 32G, but that hasn't helped. There is one zfs send hanging again :(
|
Ok so here's an update on current situation: I've updated to current HEAD, but not much has changed. It froze after one night of backups. So I've switched backups from send/recv to original old rsync (= effectively stopped any snapshot manipulation @zpool) and it seems to be running stable & ok. |
I got bitten by this using the latest rpm on centos6 zfs-0.6.3-1.el6.x86_64, also on openvz I'm still in a phase where I'm experimenting manually with zfs send/recv from the command line and if anything happens with the receiver (i.e. doesn't receive anything because I've messed up the command line params or left the destination in a state that can't receive snapshots) it just hangs. It also hangs if I try to terminate the sender with SIGINT (i.e. hitting Ctrl-C on the keyboard), no mater what the destination is, even redirecting a stream to a file on the same machine hangs on Ctrl-C. |
It looks like it only happens when deduplicated stream is being sent (zfs send -D). Tested with both OpenVZ and regular kernel and it is not openvz specific. Now that I'm looking at the comments above, I see that send examples are without -D, so my case might be different. Here is my trace: Jul 11 09:12:06 free-3 kernel: [ 1079.290247] INFO: task spl_system_task:810 blocked for more than 120 seconds. |
I've got same issue like @snajpa in the same workload on production servers, e.g. while sending incremental backups using I'm not using OpenVZ. |
We tried to reproduce issue using test stand launching |
OK, the issue got reproduced on production host with linux 3.16.1 and zfs 0.6.3, stacktraces:
After that all zfs commands got stuck in uninterruptable sleep. It's reproduced on variety of hosts so it's not a HW problem. @behlendorf: Can you, please, tell what can we do to track down the issue? I'm glad to produce any debug information needed. It's feels like a blocker, because, you known, randomly crashing hosts in production is not that well... |
@behlendorf: If I get this correctly, |
Probably that's the issue: #2652. |
@seletskiy @snajpa has this been resolved for you guys since #2652 was merged? |
@behlendorf: Yes, we are running 0.6.3 with that patch for about a month on snap/rename workload and no single error happened. |
Thanks for the confirmation. Then I'm going to close this as resolved. |
Hello,
it happens fairly often that zfs send just hangs.
There's a bit of info:
ZoL version - last commit = cd72af9 ( Fix 'zpool list -H' error code )
The text was updated successfully, but these errors were encountered: