Frequent journal rewrites with large numbers of open transactions #114

aniketschneider · 2012-12-22T06:11:16Z

We are experiencing an issue where Kestrel performance starts to degrade when we allow too many open transactions on a queue. The degradation is accompanied by very high disk i/o, and seems to occur primarily when NOT in read-behind mode. We are running Kestrel 2.2.0

I believe what is happening is as follows:

A large number of transactions are opened, until the queue size hits 0, triggering a journal rewrite since the journal is larger than the defaultJournalSize (in our case 16MB).
The rewritten journal file, due to the large number of open transactions, is still larger than defaultJournalSize.
After a single enqueue/dequeue, the journal rewrite is immediately triggered again.

I have read the bug fixed in 2.4.1 and I don't believe our setup falls under those criteria - our items are at on the order of at least 0.5-1k, and we have a 2:1 ratio between maxJournalSize and maxMemorySize.

technoweenie · 2013-05-08T20:47:22Z

We're seeing similar issues. The logs look like this:

INF [20130508-11:42:49.780] kestrel: Rewriting journal file for 'booya' (qsize=0)
INF [20130508-11:42:50.375] kestrel: Rewriting journal file for 'booya' (qsize=0)
INF [20130508-11:42:51.004] kestrel: Rewriting journal file for 'booya' (qsize=0)
INF [20130508-11:42:51.448] kestrel: Rewriting journal file for 'booya' (qsize=0)

(with a lot more entries every second until the event is over)

Here's the open transactions from collectd:

We do have 16 workers across 6 nodes. So I wonder if we're getting close to the open transaction limit.

The collectd graph for expired items is flat, so that doesn't seem to be an issue.

Ideas:

Increase the default journal size
Increase the open transaction limit

I'm just worried that increasing the journal size will make this problem occur less frequently, but longer.

EDIT: We're on Kestrel 2.4.1.

technoweenie · 2013-05-10T15:50:57Z

We fixed this for ourselves with two things:

Increasing the syncJournal setting from 100.milliseconds to 1.second. Not sure why it was set so low. This seemed to reduce the frequency of these issues.
We started compressing the data before inserting into Kestrel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frequent journal rewrites with large numbers of open transactions #114

Frequent journal rewrites with large numbers of open transactions #114

aniketschneider commented Dec 22, 2012

technoweenie commented May 8, 2013

technoweenie commented May 10, 2013

Frequent journal rewrites with large numbers of open transactions #114

Frequent journal rewrites with large numbers of open transactions #114

Comments

aniketschneider commented Dec 22, 2012

technoweenie commented May 8, 2013

technoweenie commented May 10, 2013