Permalink
Commits on Jan 22, 2016
  1. Merge pull request #150 from ReadmeCritic/master

    Update README URLs based on HTTP redirects
    agargenta committed Jan 22, 2016
  2. Merge pull request #147 from alex-hofsteede/patch-1

    s/itmes/items/
    agargenta committed Jan 22, 2016
  3. Update README.md

    agargenta committed Jan 22, 2016
  4. Update README.md

    agargenta committed Jan 22, 2016
Commits on Nov 19, 2015
Commits on Sep 21, 2015
  1. Fix bad Future.sleep substitute

    leighst committed Sep 21, 2015
Commits on Sep 8, 2015
Commits on Sep 7, 2015
Commits on Mar 25, 2015
  1. s/itmes/items/

    typo in docs
    alex-hofsteede committed Mar 25, 2015
Commits on Oct 10, 2014
  1. Remove PeriodicSyncFile

    Aniruddha Laud committed Oct 10, 2014
  2. - Kestrel: replace mutable.Queue with java.util.LinkedList for gc re…

    …asons
    
        - Fix linkedhashmap by using java.util
        - Kestrel-core changes for durability
        - Kestrel: dont fsync on a closed writer
        - Add sampling stats to persistent queue
        - Kestrel: Close Reader during replay - After replaying a file, close the reader - Local File reader should close the FileChannel as well as the FileInputStream.
        - Kestrel: Trace on NoSuchElementException
        - shutdown sequence fix fix - state change needs to be transient -> we want the last persisted state to be restored on restart - change gracefulshutdown bit for writeavoid to false since we dont need to wait to transition to this state
        - Kestrel: LogEmptyException in length() Inline with the behavior of AppendOnlyStreamWriter/Reader, when a stream is empty, its length should be treated as zero. Currently this call may throw an exception. The behavior is not common as we never explicitly call .length as the first operation once a stream is created. This happens only as a race condition between background pack and rotate.
        - Kestrel: Change PersistentStream, PersistentStreamWriter, PersistentStreamReader to be traits This ensure that only implementations of PersistentStreamContainer and PersistentMetadataStore need to provide public constructors. This helps with exception handling and retries especially if state that spans the entire container has to be reset because of shared data structures.
        - Kestrel: Recover Metadata before accessing the directory The constructor of Journal calls a getStream(queueName). When PersistentQueue is used as a library, this call may precede any call to listStreams or listQueues which would therefore create the stream before the metadata has been recovered. In case of the Kestrel service though, QueueCollection first loads queues, this recovers metadata before any other operation has a chance to access the directory map.
        - Kestrel: Log client description on client errors - IndexOutofBoundsException indicates that the request was incorrectly formed. Translate that to a CLIENT_ERROR - Log the clientDescription when we encounter a client error
        - add a metric for total fsync time, count some viz changes
        - Kestrel: Support for whitelisting clientIds that are allowed to access queues The goal of this change is to allow the users to specify the client id that is allowed to enqueue to or dequeue from a queue. The purpose of the feature is to allow incorrectly deployed code or config from enqueing to or dequeing from queues in production. This is *NOT* intended as a security feature.
        - Kestrel: Refactor ServerStatus so that the storage can be made pluggable - Introduce PersistentMetadataStore that abstracts the storage of the server status. - Provide an implementation (LocalMetadataStore) that stores the persistent metadata in a local file (similar to current implementation)
        - Kestrel: Gauge for max items and max size Both based on configuration parameters
        - Kestrel: Change to Rewrite sequence and tests The change includes i) Split the delete and rename as two separate steps in the rewrite sequence. ii) Add fail point based tests to verify rewrite behavior in the presence of failures
        - Kestrel: Remove Deprecated methods from tests Remove Deprecated methods on util.Future and use Await.result instead - helps eliminate ~100 warnings that are produced in the build
        - Remove waiters on monitorUntil in KestrelHandler and not just those from get()
        - Use eventually on kestrel flaky handoff test
        - Kestrel: Journal Replay Time This is particularly useful in debugging slow restarts - or misconfigured queues (incorrect journal parameters). I did not add a metric as replay only runs once per restart and hence there wont be enough samples to get any meaningful statistics
        - Kestrel: Refactor Journal Storage Refactor the Journal Storage as follows a) PersistentStreamContainer - the abstraction that manages all the persistent streams associated with Kestrel queue(s). b) PersistentStream - an instance of an append only durable stream, each Kestrel queue's persistent state is represented as one or more persistent streams c) PersistentStreamReader/Writer - interfaces for reading from and writing to the PersistentStream. Instead of supporting all the semantics of a generic InputStream/OutputStream, these interfaces only provide the methods that are required by Kestrel
        - Kestrel: Improve Discard by minimizing calls to FillReadBehind For large queues that are in read behind - discarding expired items need not issue fillReadBehind for every item expired. Reducing the calls to the case where the queue in memory is empty an once after the current discard loop has been completed, should improve efficiency.
        - kestrel : bump util/finagle version to 6.x version, since 5.x and 6.x has different interfaces on Future definition.
        - Fix leaking transactions on kestrel  The future could be canceled before the addPendingRead() is executed. This would result in abortAnyOpenRead() not being aware of the newly added value which would linger around as an open transaction.
        - Kestrel: Implement Open Transaction Timeout In Kestrel.
        - Kestrel: Some cleanup based on canary testing Cleanup the graceful shutdown logic and update the rewrite message so we can differentiate rewrites coming from clean shutdown
        - Kestrel Canary: Turn off the new behavior of aggressive rewrites The new logic to disableAggressiveRewrites should be off by default so that we can canary Zoo
        - Bound number of items to be expired at add/remove/peek. Allow PersistentQueue.discardExpired() to have a configurable limit of items to be discarded Currently, add/remove/peek will cause an unbounded number of items to expire. This is bad. It should bound number of items to be expired by config.maxExpireSweep instead.
    Aniruddha Laud committed Oct 10, 2014
  3. Kestrel 2.4.7

    Aniruddha Laud committed Oct 10, 2014
Commits on Apr 19, 2013
  1. Revert update to Finagle 6+ until dependencies can be advanced to the…

    … compatible versions
    
    naggati still uses Util version of 5.x, this causes memcache clients to behave incorrectly. I am reverting the change to update the finagle (and util) version until I can update naggati.
    
    I will move this to birdcage manually once the repo is ready in birdcage.
    
    RB_ID=141014
    Robin Dhamankar committed Apr 19, 2013
Commits on Apr 11, 2013
  1. Update Kestrel to Finagle 6 and friends (Util and Servo)

    The changes include
     - Using the setInterruptHandler on Promise
     - Getting rid of the onCancellation call on Future - handling the Cancelled exception instead
     - Stop overriding finagle.Service.release () use close instead.
    
    RB_ID=129914
    Robin Dhamankar committed Apr 11, 2013
Commits on Apr 4, 2013
  1. Additional Tests for aggressive rewrites

    Tests for exercising the following scenarios
    - Enabling aggressive rewrites
    - Ensure that we don't stop rewriting if the journal space is freeing up
    - Ensure that rewrites are delayed for compact delay if rewrite resulted in insufficient compaction
    
    RB_ID=137648
    Robin Dhamankar committed Apr 4, 2013
  2. Additional tests for graceful shutdown

    Adding tests for graceful shutdown
    - with non empty queue
    - open transactions
    
    RB_ID=137623
    Robin Dhamankar committed Apr 4, 2013
Commits on Apr 3, 2013
  1. Rewrite journals on graceful shutdown

    When we take a machine out of service, we first set it readonly to drain the queues. In order to minimize the startup time, we should also attempt to rewrite journals for empty queues.
    In general we want to keep shutdown very light weight so this graceful shutdown option is limited to the case when Kestrel is in readonly and quiescent state (states used for planned outages)
    Added a test to exercise the rewrite
    
    RB_ID=137141
    Robin Dhamankar committed Apr 3, 2013
  2. Disable Aggressive Rewrites by default as we prepare to canary

    RB_ID=137154
    Robin Dhamankar committed Apr 3, 2013
Commits on Mar 17, 2013
  1. Tests the replay fixes for remove tentative

    Added new test that verifies the replay fixes when remove tentative/confirm remove is used in place of unconditional removes. This also exercises the open transactions logic in the checkpoint creation and packing of journals.
    
    RB_ID=132406
    Robin Dhamankar committed Mar 17, 2013
Commits on Mar 15, 2013
  1. Kestrel Replay Fix Part IV: Concurrency between pack and rotate

    Pack and Rotations can happen concurrently. They both by definition deal with separate parts of the queue - pack for an old checkpoint while rotate will operate on the latest journal file.
    
    They both share the logic that does the clean-up of .pack files (called through calculateArchiveSize) created by the checkpoint. If the clean-up runs concurrently from two separate threads, then we can have the following race between the two threads
    
    List of files before the pack
    journal.1
    journal.2
    journal.2.pack
    
    Thread 1 reads the set journal.1 and journal.2 for cleanup
    Thread 2 cleans up journal.1 and journal.2 and renames journal.2.pack to journal.2
    Thread 1 uses the list it generates to delete the newly packed journal.2 => data loss
    
    This change addresses the synchronization
    
    Note:
    1. Pack and Rewrite never run concurrently (rewrite *does not* apply to queues that are in read behind)
    2. Rotate and Rewrite are never issued simultaneously so also they are synchronized by virtue of them being inline in the queue operations which synchronize in persistent queue implementation (only one operation at a time per queue)
    
    RB_ID=132415
    Robin Dhamankar committed Mar 15, 2013
  2. Fix ThriftHandlerSpec test by avoiding mocking of InetSocketAddress

    RB_ID=132616
    Alan Liang committed Mar 15, 2013
Commits on Mar 14, 2013
  1. KEST-402: Kestrel replay fix Part III: Recovering journal contents af…

    …ter pack
    
    - Provide a method that journal replay can use to keep the removesSinceReadBehind count in sync.
    - Update the count when replaying remove/confirm-remove/continue journal records (items) if the replaying queue is in read-behind.
    - Add a test that exercises the scenario by ensuring that repeated restarts correctly restore the count of items
    
    RB_ID=130998
    Robin Dhamankar committed Mar 14, 2013
Commits on Mar 13, 2013
  1. Update foreground.sh script to handle mesos launch.

    Mike Lindsey committed Mar 13, 2013
Commits on Mar 12, 2013
  1. KEST-402: Kestrel replay fix - Part II

    Part II: Transactional Add (Continue) should update removesSinceReadBehind only if the queue is in read behind.
    
    While this logic doesn't cause incorrect behavior as startReadBehind resets removesSinceReadBehind unconditionally, this makes it hard to reason and verify the value of the counter.
    
    RB_ID=130997
    Robin Dhamankar committed Mar 12, 2013
Commits on Mar 8, 2013
  1. Remove the reverse lookup in the log message

    Use getHostString instead of getHostName
    
    RB_ID=130001
    Robin Dhamankar committed Mar 8, 2013
Commits on Feb 28, 2013
  1. Kestrel replay fix

    Kestrel replay fix - Part 1:
    
    Problem: The pack request is triggered by the fillReadBehind where as the journal.remove updates removesSinceReadBehind after the fillReadBehind call. This ordering leaves two windows in which either we will lose an item or deliver an extra (potentially dummy/invalid item).
    
    This change addresses the problem as follows
    1. In the normal mode - fillReadBehind is called after the remove has already been logged in the journal - this makes sure that the removesSinceReadBehind count has already been updated to reflect the fact that the current item has been removed from the in-memory queue which is used as the source of the pack. The same is done for the discard/expiration workflows
    2. In the replay mode - Since journal removes are already logged (or are not being newly logged) the fillReadBehind is done as before.
    3. The pack logic calls a fsync on the journal file before the packed file replaces the old journal files on disk to ensure that the removes that were accounted for in the pack have been indeed logged in the journal.
    4. The change to PersistentQueue#setup is just for the tests
    
    This doesn't fully address KEST-402. However this is necessary for the second part of the fix.
    
    RB_ID=121133
    Robin Dhamankar committed Feb 28, 2013
  2. Change the Journal Rewrite/Rotation logic

    The condition that aggressively rewrites the journal when the queue is empty is expected to be a cheap operation that would shrink the journal to a very small size. When there are a large number of open transactions, even after the rewrite the journal size may not shrink. Since checkRotateJournal is called on each get/set, it effectively means we will rewrite the journal on each get and set without ever shrinking the journal size. This repeated rewrites affect latency of writes to the queue (as journal rewrite is synchronized) and also affects the I/O to the journal disk which can affect other queues on the same machine.
    
    Once in the past we have tried to remove this logic and resulted in not compacting the journal when it should have been compacted - so I don't want to completely remove, just reduce the frequency so we don't have repeated re-writes.
    
    Here the logic is changed as follows:
    1. When the queue is empty, if the journal rewrite resulted in the journal size being larger than half the default size (what triggered the rewrite), this indicates that the number of open transactions is large, so we delay subsequent compaction by the compaction delay interval - this should provide an opportunity for the open transactions to drain and subsequent rewrite to successfully shrink the journal
    2. Before attempting to rotate the journal, if the queue is empty first attempt to rewrite. If the journal still does not shrink, then proceed with the rotate.
    
    RB_ID=116056
    Robin Dhamankar committed Feb 28, 2013
Commits on Jan 16, 2013
  1. Merge branch 'master' of https://git.twitter.biz/kestrel into kestrel…

    …-pidfile
    Mike Lindsey committed Jan 16, 2013
Commits on Jan 10, 2013
  1. 'release commit for net.lag:kestrel:2.4.3-SNAPSHOT'

    Robin Dhamankar committed Jan 10, 2013
  2. 'release commit for net.lag:kestrel:2.4.2'

    Robin Dhamankar committed Jan 10, 2013
  3. Change Log for the 2.4.2 Release

    Robin Dhamankar committed Jan 10, 2013
Commits on Jan 8, 2013
  1. Tracing operation on individual queues

    This change allows tracing operations to an individual queue and identify misbehaving clients. By limiting the logging to an individual queue, we can track down queue backups or leaking transactions on a single queue without affecting other queues on the instance of Kestrel. Also the trace out put produced per queue will be much more manageable compared to enabling this for all queues on an instance.
    
    The main motivation for this is to track down the open transactions leak.
    
    RB_ID=116677
    Robin Dhamankar committed Jan 8, 2013
  2. Trace client connections and session lifetime

    - Since the queue that a session will operate on is only determined after the session has been established, its cleaner to enable session trace separately through Kestrel Config
    - This will help us determine all the clients connecting to a machine when the number of connections are high. Also it helps track down the cases when the session did not clean-up upon termination
    
    There are clusters that are seeing a large connection churn so I did not want to enable this by default - as it may bloat the log for no reason. Since config deployment doesnt require a Kestrel restart, enabling and disabling this for the duration of the investigation makes more sense.
    
    RB_ID=116866
    Robin Dhamankar committed Jan 8, 2013
Commits on Jan 3, 2013
  1. Merge branch 'master' of https://git.twitter.biz/kestrel into kestrel…

    …-pidfile
    Mike Lindsey committed Jan 3, 2013
Commits on Jan 2, 2013
  1. Add Journal Rotation Metric

    - Stat counter for number of times this journal has been rewritten
    - Separate stat counter for the number of times this journal has been rotated and a checkpoint was optionally generated
    
    RB_ID=116055
    Robin Dhamankar committed Jan 2, 2013