New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data stuck on holding disk #69

Closed
Hawk777 opened this Issue Feb 23, 2016 · 10 comments

Comments

Projects
None yet
2 participants
@Hawk777

Hawk777 commented Feb 23, 2016

Amanda 3.3.9’s driver is deciding not to use a new tape even though there is data left to flush. I will attach a level-9 debug output from the taper. I have tried to understand the logic in tape_action and see why this is happening, but I don’t understand it enough to know where the problem is coming from.
driver.20160223003644.txt

@martineau

This comment has been minimized.

Member

martineau commented Feb 23, 2016

tape_action is one of the most complex function in amanda.

Can you also post the amdump. file?

@Hawk777

This comment has been minimized.

Hawk777 commented Feb 23, 2016

If you mean amdump.20160223003644.debug, then here you go. It doesn’t say much though. I should be able to reproduce the problem if you want me to enable a higher debugging level there too.

amdump.20160223003644.txt

@martineau

This comment has been minimized.

Member

martineau commented Feb 23, 2016

yes, retry with:
debug-driver 9
Post the resulting driver debug file and amdump log file

@Hawk777

This comment has been minimized.

Hawk777 commented Feb 23, 2016

Some background information. I’m pretty sure this didn’t happen with version 3.3.3, which is what I was using before upgrading to 3.3.9. My distro went from 3.3.3 to 3.3.7, but I couldn’t use the latter because amcheck-device segfaulted; that’s fixed in 3.3.9. I can try some other versions if needed (maybe 3.3.8?).

This is a dump of two DLEs, one with an output size of 5M and the other 1290M, for a total of 1294.5M. The problem shows up when I do a level 0 dump. I am writing to vtapes with chg-disk, tape size 1G. Part size doesn’t seem to affect the result. What happens is the first 1G is written to a tape, and then the remaining 266M isn’t written to tape, with the amreport saying “not using all tapes because taperflush criteria not met”—but all 1290M are left in the holding disk, and a subsequent amflush writes the first 1G to another vtape, reporting the same error and still leaving all 1290M in the holding disk.

Finally, sorry, when I said “taper” in the first message I meant “driver”. That attachment is, as its name suggests, driver.20160223003644.debug, not taper. It was taken at level 9. I just ran another dump with driver-debug 9, but there’s nothing to see beyond what’s in the original log.

In an immediate sense, the cause seems pretty clear: new_data≠0, but that’s not a criterion for starting a new tape. I don’t understand where all the values come from, though. It looks like some more debug output might have been added since 3.3.9 (I see some more debug prints in master that aren’t in my log). I can arrange to try that out if you think it would help.

@martineau

This comment has been minimized.

Member

martineau commented Feb 23, 2016

You said the error is: not using all tapes because taperflush criteria not met
What is your taperflush setting?
amgetconf CONF taperflush
It should be set to 0 since it is what you want.

@Hawk777

This comment has been minimized.

Hawk777 commented Feb 23, 2016

Yes, that’s the message I get. A report is attached. report.txt

To answer your question, and others that seemed relevant as well:

amanda@eland ~ $ amgetconf weekly taperflush
0
amanda@eland ~ $ amgetconf weekly flush-threshold-dumped
0
amanda@eland ~ $ amgetconf weekly flush-threshold-scheduled
0
@martineau

This comment has been minimized.

Member

martineau commented Feb 25, 2016

I think I understand the problem, it is when the latest DLE span two tapes, it refuse to write to the second tape.
The workaround is to increase the size of your vtapes so both DLE could fit on one tape.
I'm working on a fix, should be ready next week.

@Hawk777

This comment has been minimized.

Hawk777 commented Feb 25, 2016

Yes, the small DLE was written to vtape first, and then the large DLE used what was left of the first vtape but it didn’t start a second vtape when it should have. I discovered the workaround you described so I was able to get my level 0 written out. Glad to hear you were able to reproduce the problem.

@martineau

This comment has been minimized.

Member

martineau commented Feb 29, 2016

I committed the attached patch to fix the issue.
stuck-dle-span-2-tapes.diff.txt

@martineau martineau closed this Mar 1, 2016

@Hawk777

This comment has been minimized.

Hawk777 commented Mar 3, 2016

Thank you for the prompt attention. That patch seems to work for me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment