Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performing an "ldmadmin restart"" in LDM 6.13.13 and previous can result in queue corruption, data loss #89

Closed
sebenste opened this issue Mar 27, 2021 · 11 comments

Comments

@sebenste
Copy link

OS: Centos 7, fully updated

This bug has actually been around for many years, but I was hoping it was vanquished. I guess not...

We were bit by this bug this evening, When executing an "ldmadmin restart", it has an issue whereby upon starting up, the LDM will start normally, except within hours, the queue will become corrupt, and data won't write to physical media. You must then stop the LDM, remake the queue, and start it again. This fixes the issue. Every time you delete the queue after you stop it, upon restart, everything is fine. But, occasionally, if you only do an "ldmadmin restart", the queue becomes corrupt. This is more likely to happen, in my experience, if:

You are running a high-volume, high file-size count feed (think Level2 radar or CONDUIT)
If you do multiple restarts of the LDM, spaced hours or more apart

This does NOT happen, ever, if the queue is deleted and remade before restarting the LDM, even if you do this:

ldmadmin stop
ldmadmin clean
ldmadmin delqueue
ldmadmin restart

Or this:

ldmadmin stop
ldmamdin delqueue
ldmadmin mkqueue
ldmadmin start

It only happens when doing a straight "ldmadmin restart" command, nothing before or after it. Furthermore, it may not happen until hours after a restart.

@sebenste sebenste changed the title "ldmadmin restart"" in LDM 6.13.13 and previous can result in queue corruption, data loss Performing an "ldmadmin restart"" in LDM 6.13.13 and previous can result in queue corruption, data loss Mar 27, 2021
@semmerson
Copy link
Collaborator

semmerson commented Mar 27, 2021 via email

@sebenste
Copy link
Author

sebenste commented Mar 29, 2021 via email

@semmerson
Copy link
Collaborator

Is there any evidence of what the problem might be? Does an "ldmadmin restart" indicate anything?

@sebenste
Copy link
Author

sebenste commented Mar 29, 2021 via email

@semmerson
Copy link
Collaborator

How have you determined that the queue becomes corrupt?

@sebenste
Copy link
Author

sebenste commented Mar 30, 2021 via email

@semmerson
Copy link
Collaborator

semmerson commented Mar 30, 2021 via email

@sebenste
Copy link
Author

sebenste commented Mar 30, 2021 via email

@semmerson
Copy link
Collaborator

semmerson commented Mar 30, 2021 via email

@sebenste
Copy link
Author

sebenste commented Mar 30, 2021 via email

@sebenste
Copy link
Author

This was fixed in 6.13.14. Closing ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants