Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Sync Issue in PersistentQueue on I/O Exception? #21

Closed
ebarlas opened this Issue Aug 4, 2010 · 5 comments

Comments

Projects
None yet
2 participants

ebarlas commented Aug 4, 2010

Upon close inspection of the PersistentQueue class, it occurred to me that if an I/O Exception is raised at certain points, the in-memory queue may become out of sync with the journal. For example, this can occur in add if an I/O Exception occurs on journal.add after the item has been added to the in-memory queue. Similar behavior exists in remove. Is this an accurate reading of the code? If so, what is the reasoning behind it? Thanks.

Contributor

robey commented Aug 5, 2010

it looks like an i/o exception would bounce out to the handler and possibly disconnect the client. i guess we should catch exceptions when writing the journal, and kill the server if they happen, so that queues don't get into this weird state if the disk fills. does that sound okay?

ebarlas commented Aug 5, 2010

Hmm, possibly. The best approach, I suppose, would be to rollback journal operations, but it seems to me that simply isn't possible with the current system. Another approach is to place I/O operations ahead of in-memory data structure operations to raise I/O exceptions before modifying the queue, transaction table, or other PersistentQueue data. That should keep the PersistentQueue in a consistent state. Yet another approach is to close and reopen the queue on I/O Exceptions, however this may seemingly result in a huge number of journal reads as the journals are replayed. Perhaps this is just something to be aware of and need not be addressed?

ebarlas commented Aug 9, 2010

Thoughts?

Contributor

robey commented Aug 10, 2010

i think you're right that it shouldn't try to continue as if nothing happened.

i'm leaning toward catching i/o exceptions inside the journal code, and writing a fatal log message and calling system.exit. it would be an unambiguous signal that something has gone wrong with the machine, and i think if the machine is hosed, kestrel shouldn't try to paste over it.

ebarlas commented Aug 11, 2010

Okay, that does seem reasonable. One problem is that it might adversely affect folks using Kestrel as a library since the proposed fix would shutdown the JVM.

@robey robey closed this Apr 10, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment