Skip to content
Commits on Apr 4, 2011
  1. added version postfix

    committed Apr 4, 2011
Commits on Feb 27, 2011
  1. @tsuna

    Bump up Scanner.DEFAULT_MAX_NUM_ROWS from 16 to 128.

    16 turned out to be too conservative.  Document the fact that this
    setting has a high performance impact on the scanning performance.
    A higher default value will provide better performance out of the
    box.  OpenTSDB sees a 40-50% speedup with this value.  People with
    very large rows will probably be aware of these considerations and
    are more likely to adjust the value than those who're using typical
    HBase rows, which tend to be fairly limited in size.
    Change-Id: I778e28c9211e2c0d462d66f7dd35f0c9ddd5afa6
    tsuna committed Feb 27, 2011
Commits on Jan 30, 2011
  1. @tsuna

    Add support code for different HBase server versions and 0.90.

    The goal is to transparently support different HBase versions with
    the same client.  Originally, this client didn't care what version
    the server was, because for what it uses of the protocol, nothing
    had changed since it was originally written, despite the RPC protocol
    version being bumped up a few times.  Unfortunately, HBase isn't
    very good at maintaining backwards compatibility, and sadly 0.90
    has a minor change that breaks it.
    Change all HBaseRPC implementations to declare whether or not they're
    sensitive to the version of the remote server.  When HBase folks break
    backwards compatibility by changing the on-wire format of an RPC, then
    this RPC becomes sensitive to protocol version.  This unfortunately
    happened in HBase 0.90 for the Get RPC (for no good reason IMO).
    HBaseRPC implementations are given the server's version when they're
    asked to serialize themselves, so they can adjust their behavior based
    on the version of the server they're talking to.
    Add a mechanism to automatically request the RegionServer's RPC protocol
    version with no extra overhead most of the time.  We used to piggyback
    the magic "hello" header in the first packet sent to the RegionServer,
    along with the first RPC.  We now additionally piggyback a version
    request too, so that both the "hello" + 1st RPC + version request go
    out in the same TCP packet (most of the time the 1st RPC is small).
    If a version-sensitive RPC attempts to go out before we know the version
    of the server, we delay it until the version is received.
    Change-Id: I130afd4305dfbe8bac4cfe4a50a06c6e239e266e
    tsuna committed Jan 29, 2011
  2. @tsuna

    Document an unimportant oddity of the HBase RPC protocol.

    For HBase's defense, this oddity comes from Hadoop's legacy.
    This is only for the sake of completeness.  No code change.
    Change-Id: I40cfdf8fd80005e7df2a30d8793a336e2457259d
    tsuna committed Jan 29, 2011
  3. @tsuna

    Add support for CDH3b3.

    This change adds support for CDH3 b3.  In order to enable the new code,
    the JVM must be given the following system property in argument:
    CDH3b3 includes a temporary patch that changes the format of the "hello"
    header that clients must send when they connect.  The change is not
    backwards compatible and results in clients getting disconnected as
    soon as they send the header, because the HBase RPC protocol provides no
    mechanism to send an error message back to the client during the initial
    "hello" stage.  Thus, older clients will never be able to get past the
    initial -ROOT- lookup.
    For reference, the patch I'm referring to is:
      CLOUDERA-BUILD. HBase running on secure hadoop, temporary patch.
      This is not upstreamed, since it currently is very difficult to do this
      without reflection or a shim layer. This will be upstreamed with the
      larger project of HBase security later this year.
    Change-Id: I421e89447e3b55b3000e4ebc486bc90abcbb7076
    tsuna committed Jan 18, 2011
Commits on Dec 10, 2010
  1. @tsuna

    Don't try to disconnect a dead client.

    This doesn't harm, it's just useless.
    Change-Id: I75cc9d6acc2282154a1936c72980c34ecfe10657
    tsuna committed Dec 10, 2010
  2. @tsuna

    Bug fix: fetch the -ROOT- region from ZK if our client is dead.

    The code was sometimes incorrectly using a stale reference for the
    client no longer connected to the -ROOT- region.  This would especially
    happen when HBase restarts and the client was reading a stale znode in
    ZK, depending on the order in which RegionServers are restarted.  When
    this bug was triggered, the client was unable to find the new location
    of -ROOT- (and thus .META.), and was essentially stuck forever.
    Change-Id: I27d2836103a04e4886cd2b92a302c30449f77fbd
    tsuna committed Dec 10, 2010
Commits on Nov 22, 2010
  1. @tsuna

    Don't shutdown while RPCs are waiting for a -ROOT- lookup.

    The following scenario led to data loss:
      1. Application starts.
      2. A PutRequest is generated, triggers a -ROOT- lookup.
      3. Application calls shutdown() on the HBaseClient, the client
         terminates and the PutRequest is lost.
    Now shutdown() will wait if there's an ongoing -ROOT- lookup
    to allow the PutRequest to complete.  This bug was only likely
    to affect very short lived programs, not long lived servers.
    Change-Id: I7da4d5e81d59e75ae5acbdeca0cc72d8078b3ce1
    tsuna committed Nov 21, 2010
Commits on Nov 12, 2010
  1. @tsuna

    Remove an unnecessary `if' statement.

    Was probably left over after a few iterations of refactoring.
    Change-Id: I7cd18ca4855a633900eb7e7793b08da496427975
    tsuna committed Nov 12, 2010
  2. @tsuna

    Fix an NPE caused by a race condition when sending RPCs.

    This NPE could occur when a RegionClient was getting disconnected from
    its RegionServer while serializing an RPC.  We now double check that
    we're still connected before sending the RPC out to the wire.  If we're
    not, we let the RPC go through the normal "oops, we're disconnected"
    code path.
    The NPE could escape out of asynchbase and affect code that uses the
    Change-Id: I288a85eedf4ad2fd6922682be67533ede1da52f1
    tsuna committed Nov 11, 2010
Commits on Nov 8, 2010
  1. @tsuna

    Allow a KeyValue to have an empty value.

    The code was erroneously refusing to de-serialize a KeyValue with an
    empty key, even though this is permitted by HBase.
    Optimization: don't allocate new empty byte arrays.
    In the code parsing META entries: properly handle `info:server' entries
    with an empty value, as can happen during an NSRE caused by a split.
    Change-Id: Idfb32815871d1812a075cfcd84b040f868b74a24
    tsuna committed Nov 8, 2010
  2. @tsuna

    Fix the Scanner for scanners that scan until the end of the table.

    There was a bug in Scanner that caused the Scanner to stop scanning
    after the first region when the Scanner was told to scan until the
    end of the table (stop_key = empty key).
    Change-Id: Ic2d9403006a702fe04e0bb2005c1d07a33c3cb92
    tsuna committed Nov 7, 2010
  3. @tsuna

    Behave better when we cannot connect to a RegionServer.

    When the code couldn't connect to a RegionServer, it was immediately
    retrying all RPCs that wanted to go to this RegionServer.  If the
    server in question was down, this led to several retries in a row,
    in a tight loop, until RPCs failed because they had too many attempts.
    The new behavior now treats a ConnectionResetException exactly just
    like a NotServingRegionException and goes through the exact same code
    path.  This helps gracefully handle RegionServer failures as the NSRE
    handling code path is good at discovering when a region comes back
    online.  It also helps quickly catch all subsequent attempts to use
    this region and queue RPCs, exactly like in an NSRE situation.
    This change also fixes a bug where if the connection to a RegionServer
    went down while we have edits buffered for this server, we'd lose those
    Change-Id: I190be53b45f7e0762b42aa2c706bfa14a5687273
    tsuna committed Nov 7, 2010
  4. @tsuna

    Comment out some DEBUG logging in the fast-path.

    Those DEBUG logging statements are only rarely useful when debugging
    low-level problems in asynchbase.  In a real application, they flood
    the logs, even if the application is only moderately busy.  I chose
    to keep those statements around, even though I don't like dead code,
    because they can be useful again in the future when debugging certain
    Change-Id: I0b633341c4e39fafe3a8b12b2fbb597c1616093c
    tsuna committed Nov 7, 2010
Commits on Nov 7, 2010
  1. @tsuna

    Remove unused imports.

    Change-Id: I59fc6451720b4d5d178531624f7546ffcd1686da
    tsuna committed Nov 5, 2010
  2. @tsuna

    Schedule periodic flush timers only on demand.

    Any RegionClient that was connection to a RegionServer has a timer to
    periodically flush buffered edits.  The timer was re-scheduled as long
    as the client lived, from the point where it was connected.
    This change schedules the timer only when we create a new batch of
    buffered edits.
    The consequence of this change is that applications that aren't write
    heavy, or only have sporadic write periods, or only write to a small
    subset of all the RegionServers they use, will have far fewer timers.
    Change-Id: Ie793ad6cc79343ec006cccf2e62a07e7d5338407
    tsuna committed Oct 29, 2010
  3. @tsuna

    Don't flood the logs when -ROOT- is offline.

    When -ROOT- was offline, every single RPC that needed to access it was
    logging an INFO message.  This problem rarely happened because -ROOT-
    is very rarely accessed, but during or after an HBase outage, this
    message could be logged for every single RPC.
    We now log the message only once, the first time we realize we need to
    find where -ROOT- is.
    Also, only log a warning when getting disconnected from ZooKeeper while
    we're trying to locate the -ROOT- region.  If we're not trying to locate
    the -ROOT- region, we don't care about losing our ZooKeeper session.
    Change-Id: Iea0b639b91940c4cb4909b8221c91c07f7815079
    tsuna committed Oct 29, 2010
  4. @tsuna

    Prevent a deadlock during shutdown.

    When shutting down, we properly disconnect all our RegionClients.
    After doing so, we check that there are no clients left, and if
    we manage to find some, we log an error.  This innocuous logging
    statement could lead to a deadlock in rare error cases due to
    inconsistent lock ordering.  The required lock ordering is now
    explicitly documented and we no longer log the error while
    holding the lock, we instead make a copy of the HashMap and then
    log the copy.
    Change-Id: Ic698dbefb19974f58d6ed534498043e0b83aaf20
    tsuna committed Oct 29, 2010
  5. @tsuna

    Behave better when getting an NSRE during META lookups.

    When getting NSRE'd during a META or ROOT access, the code was retrying
    properly but it wasn't invalidating the cached entry for META or ROOT.
    In some cases, this led to unnecessary failures of RPCs or the code was
    not able to trigger a ROOT lookup to trigger a failure of both META and
    ROOT.  The code was also not behaving properly when the end-user code
    was accessing META or ROOT directly (which doesn't happen in typical
    HBase applications).
    This changes the GetClosestRowBefore RPC to be retry-able RPC in face
    of NSRE failures.  This RPC wasn't going to the normal NSRE handling
    code path because it pretended to not be directed to a specific region.
    Change tested against a modified RegionServer that throws an NSRE with
    a 20% probability (which helped uncover a lot of corner cases).
    Change-Id: I67277a75a8b59d8cc650677a8569ad3326e96445
    tsuna committed Oct 19, 2010
  6. @tsuna

    Don't fail pending RPCs during shutdown, wait for them.

    The old behavior was annoying and unpredictable, especially
    for short-lived programs.
    Also, do a flush() during shutdown to make sure we don't
    drop buffered edits on the floor.
    Change-Id: Ib2c34d244b7c0d7c1c58eb65a2907e1303baf57d
    tsuna committed Oct 13, 2010
  7. @tsuna

    Prevent META storms.

    We now use a semaphore to automatically throttle META lookups (slightly)
    once we reach 100 concurrent lookups.  In my loadtests, this simple
    approach prevents the application from sending over 130k concurrent
    META lookups when starting, without impeding on throughput (it actually
    helps it somewhat in certain tests, but overall the difference isn't
    statistically significant).
    Change-Id: I5cc59283440e778eeed1e1fa455614e772aa9ec5
    tsuna committed Oct 13, 2010
  8. @tsuna

    Stop chaining Deferreds when waiting for the -ROOT- region.

    Sometimes we can have enough RPCs waiting on -ROOT- to hit the limit
    in the maximum callback chain length.  Plus, this is suboptimal since
    we don't actually *need* a Deferred chain.
    Simplify the code around getting the -ROOT- region asynchronously.
    The first callback invoked right after discovering -ROOT- wasn't
    necessary anymore.
    Also continue to convert anonymous classes to inner / local classes,
    to make debugging (especially heap analysis) much easier.
    Don't retry an RPC that failed with a NonRecoverableException.
    Change-Id: I1082a520f0c0487adc7c421dcffd1280d51093cc
    tsuna committed Oct 13, 2010
  9. @tsuna

    Work around an infinite loop in Netty when shutting down.

    Netty gets stuck in an infinite loop if you try to shut it down from
    within a thread of its own thread pool.  They don't want to fix this
    so as a workaround we now always shut Netty's thread pool down from
    another newly created thread.
    Change-Id: If140a1931ad80758da458619ad987431b436a637
    tsuna committed Oct 13, 2010
  10. @tsuna

    Completely rewrite the code handling NotServingRegionException.

    The new NSRE handling code is far superior to the previous one.
    It fixes all known problems with NSREs handling in async HBase
    and effectively prevents "META storms" where high throughput
    async HBase clients were pounding the .META. table during NSRE
    situations caused by splits, region re-assignments or machine
    The new NSRE handling logic is documented in great details in
    HBaseClient.handleNSRE, but here's a brief overview of the new
      - Whenever we get a new NSRE, we flag the region as being
      - All subsequent RPCs for this region discover that the
        region has been NSRE'd and are queued to wait until HBase
        brings the region back online.
      - A "probe" request is created to "poke" HBase and see if
        the region is back online.  The probe is periodically retried
        with an exponential-ish backoff.
      - When the probe succeeds, all the RPCs queued for the region
        are re-tried.
    Tested on a 0.89 release by importing 1 billion KeyValues in HBase
    at the rate of 150k KV/s.  The import was actually 5 times the same
    200M KVs, to force HBase to split many regions.  On this cluster,
    HBase tends to resolve most splits in less than 6s.
    Change-Id: I59074ef3fa84e950797ddc5522638d7e9b8c2917
    tsuna committed Oct 13, 2010
  11. @tsuna

    Avoid unnecessary warnings when invalidating the regions cache.

    Compare the region names to avoid logging a confusing `Oops' warning
    when invalidating a RegionInfo reference that isn't the one we expected
    as long as it's for the right region name.
    Since the library logs at the INFO level when regions are added to
    its local cache, also log cache invalidations are the INFO level to
    be consistent.  This can be noisy in large clusters with many regions,
    especially when the application starts or when HBase is actively
    moving regions around (e.g. due to machine failures).  Maybe we'll
    want to tune this down to the DEBUG level (for both messages).
    Change-Id: Id56849972740a1b36d26889f9f0f7007a5297c23
    tsuna committed Oct 13, 2010
  12. @tsuna

    Replace HBaseRpc.popDeferred() with a more reliable API.

    This will be particularly important for the new NSRE handling code
    as well as for per-RPC timeouts.  It also fixes cases where the
    `attempt' count of an HBaseRpc wasn't properly reset to 0 once the
    RPC had completed with an error.
    No user-visible API change.
    Change-Id: I5cfa1479617da4f35f74c901639eabb08f7ea53c
    tsuna committed Oct 13, 2010
  13. @tsuna

    Completely re-implement the write path (PutRequest and such).

    Warning: This change contains incompatible API changes (at the end).
    The existing write path was too complicated and didn't fit well in the
    rest of the code.  It had its own very special (and very long) code path
    to send RPCs to the wire, which made handling those RPCs internally a
    PITA.  In addition, MultiPutRequest was very memory hungry due to the
    deeply nested map-of-map-of-map-of-list-of-edits.  This crazy map was
    retained during the whole lifetime of the RPC and then fed to the GC.
    In the buffering code, edits were added to the map while holding a lock.
    Since all ByteMap operations are O(n log n) and there were several of
    them involved, the lock was held for long enough to be easily contended
    in a write-heavy application.  Just to add even more complexity, the
    code was handling durable and non-durable edits separately for no good
    reason, which made certain atomic operations on buffered edits more
    complicated than is necessary.
    The new MultiPutRequest simply aggregates PutRequests together in an
    array.  There are far fewer references involved and adding an edit
    happens in amortized O(1).  Also we now need to sort the array before
    serializing it, and we need an extra O(2N) byte array comparisons, but
    this is still far cheaper that the old code which was iterating on the
    crazy map.  Plus, overhead at serialization time is a one-time cost
    whereas overhead when adding an edit to a MultiPutRequest induces lock
    contention, which is way worse as it impedes on scalability.
    We now also support single-edit "put" requests.  When we attempt to
    send out a single-edit multi-put, it gets automatically transformed
    as a single-edit put.
    We now also support "unbufferable" edits, which will always be sent
    out to HBase directly, regardless of the flush interval.  Previously
    this was only possible when an edit was given an explicit lock.  This
    is new feature is useful for applications that are mostly throughput
    driven but occasionally need low-latency write accesses.
    When a MultiPutRequest fails, instead of sending another one we now
    split the edits back to single edits and send them individually to
    HBase.  Because of HBASE-2898, this is actually more efficient
    especially when dealing with a NotServingRegionException, as in this
    case it allows us to know precisely which regions are unavailable and
    let the edits going to other regions succeed.  It also allows the
    edits that aren't affected by the NSRE to proceed immediately.
    Incompatible API changes: the following public methods no longer exist:
      HBaseClient.put(PutRequest, boolean)
      HBaseClient.put(List<PutRequest>, boolean)
    In order to specify whether an edit is to be stored in a durable
    fashion, you must now use PutRequest.setDurable(boolean).
    Since async HBase is still beta software, I'm taking the liberty to
    remove this from API without starting to carry the @Deprecated burden.
    Change-Id: I9654e3e39700faaaac19ee9234ffaa60b376bad3
    tsuna committed Oct 12, 2010
Commits on Oct 29, 2010
  1. @tsuna

    Show the number of buffered edits in RegionClient.toString().

    Also, hold the lock on the RegionClient for fewer instructions.
    Change-Id: I2a2a6386cc34e91158cb7d235064085147d5f9fa
    tsuna committed Oct 20, 2010
  2. @tsuna

    Add assertions as a safety net.

    Refuse to encode an outgoing RPC if it doesn't have a Deferred already.
    This happens when the caller didn't call getDeferred() on the RPC before
    sending it out to the wire, which indicates a programming bug in the
    HBase async library.  I tripped on this bug a few times already so this
    assertion will help immediately catch it during development.
    Refuse to allow an RPC to be given a RegionInfo if this RPC isn't going
    to a specific table / key.  This should never happen and it will be even
    more of a problem with the new NSRE handling code, so the assertion will
    help catch this class of bugs early during development.
    This requires changing the ReleaseRequest for RowLocks slightly because
    it was specifying a region without a table / key pair, as those are not
    needed / used for this RPC.
    Change-Id: Id3264f9d1c735f59e10d9c0c3793375814dfdc0c
    tsuna committed Oct 12, 2010
  3. @tsuna

    Add internal support for "exists" RPC.

    This will be used by the new NSRE handling code.
    Change-Id: Ibbc56b2a45045b6bde2eaf30c96dbe9987d4aa29
    tsuna committed Oct 12, 2010
  4. @tsuna

    Improve error reporting when remote exceptions occur.

    Embed the HBaseRpc instance that caused the remote exception inside the
    exception itself, and embed it in the message carried by the exception.
    No longer pretends that NotServingRegionException can tell you which
    region caused the exception, because we don't have this information
    unless we manually try to parse the string of the exception returned
    by the server, which we don't do because it's brittle at best.
    Incompatible API changes: the following public method no longer exists:
    Since async HBase is still beta software, I'm taking the liberty to
    remove this from API without starting to carry the @Deprecated burden.
    Change-Id: I8794d648723db609a6dbcf47a031db20270c4632
    tsuna committed Oct 8, 2010
Something went wrong with that request. Please try again.