Commits on Oct 5, 2016
  1. Added some extra logging when OOM occurs in BnP.

    The AvroStoreBuilderMapper can OOM when manipulating certain bad Avro records.
    This change does not actually prevent the OOM, but merely prints some useful
    info before dying.
    FelixGV committed Oct 4, 2016
Commits on Sep 26, 2016
  1. Replaced the following instances with

    …since urls don't resolve from some places
    $ grep -r 'http://project-voldemort' .
    ./clients/python/      url='',
    ./NOTES:For the most up-to-date information see
    ./contrib/collections/src/java/voldemort/collections/ *        voldemort JSON formats:
    mattwisein committed Sep 26, 2016
Commits on Sep 20, 2016
  1. Releasing Voldemort 1.10.22

    mattwisein committed Sep 20, 2016
Commits on Sep 13, 2016
  1. The BnP job should be resilient to colo failures, but this regressed.

    This commit adds a safe guard to bring back resilience to full colo
    Now, if a colo is unreachable, the BnP job will still push to the
    other (healthy) colos, but it will fail the job afterwards with a
    message saying which colo failed.
    FelixGV committed Sep 12, 2016
Commits on Sep 6, 2016
  1. Python client has an issue with inconsistent indentation (#446)

    The indentation in the code is mostly spaces while the offending
    line is tab indented. Hence, importing and initializing the client
    fails with an Indentation error.
    esawtooth committed with mattwisein Sep 6, 2016
Commits on Aug 30, 2016
  1. Provide chunk size suggestion for BnP jobs with chunk overflow except…

    …ions and fix num chunks algorith to round up
    mattwisein committed Aug 29, 2016
Commits on Aug 29, 2016
  1. Introduced new boolean "readonly.omit.port" server configuration.

    When set to true, the port will be removed from the fetch URI. In
    this case, the already-existing "readonly.modify.port" setting is
    When set to false (which is the default), then the port will be
    left as part of the fetch URI (according to the already-existing
    "readonly.modify.port" setting).
    FelixGV committed Aug 25, 2016
Commits on Aug 17, 2016
  1. stream support for system stores

    The commands
    bin/ stream fetch-entries
    bin/ stream fetch-keys
    does not work on System stores like voldsys$_client_registry
    There is a client side check for valid stores, which only is
    validating the user stores. Added a check to include the system
    stores as well.
    arunthirupathi committed Aug 17, 2016
  2. Data Cleanup job Does not run on system stores

    1) Client registry System store is a in-memory store and supposed to be cleaned up after 7 days.
    Last change to the DataCleanupJob made the system stores fail with the missing store exception.
    Clients re-use the same client id, so unless lots of clients become
    dead and removed, this will not cause a leak on the server resources. The effect is negligible.
    Now the DataCleanupJob checks for both system stores and normal stores for a store definition.
    2) If the store retention days is modified to zero, then the store will
    delete all the records. But if the store is started with 0 retention days
    it means the data retention is not enabled. Fixed the discrepancy.
    arunthirupathi committed with arunthirupathi Aug 17, 2016
Commits on Aug 11, 2016
  1. Revert "Provide chunk size suggestion for BnP jobs with chunk overflo…

    …w exceptions and fix num chunks algorith to round up"
    This reverts commit fdd2ca9.
    squarY committed Aug 11, 2016
Commits on Aug 10, 2016
  1. Releasing Voldemort 1.10.21

    Fix release notes.
    squarY committed Aug 10, 2016
  2. Merge pull request #436 from squarY/timeoutfix

    Fix: Extend the timeout of admin request
    squarY committed on GitHub Aug 10, 2016
  3. Fix: extend admin request time out from 1min to 5min.

    Add more logs when handing failed fetch request.
    Fix issues based on RB.
    squarY committed Aug 10, 2016
Commits on Aug 5, 2016
  1. Provide chunk size suggestion for BnP jobs with chunk overflow except…

    …ions and fix num chunks algorith to round up
    FelixGV committed with mattwisein Jun 10, 2016
Commits on Jul 28, 2016
  1. RO Store Create floods the Log with error messages

    Creating a RO store queries for an existing store, which
    fails with StoreNotFoundException. This exception is logged with call
    stack, which floods the logs on every store creation.
    This may trick the alerting system into treating this as error.
    Not logging a call stack and reducing the log to info, when such
    exceptions are logged.
    arunthirupathi committed Jul 28, 2016
Commits on Jul 25, 2016
Commits on Jul 21, 2016
  1. Log verifyOrAddStore time in the logs.

    Currently the time spent in verify or Add Store is not tracked.
    This change introduces a log line to track this time.
    Following log message will be added, for the calls.
    [18:36:23,113 voldemort.client.protocol.admin.AdminClient] INFO
    verifyOrAddStore() BootStrapUrls: [tcp://localhost:48150] Store:
    abc-xyz-read-only Verification Time: 10 ms, Creation Time: 39 ms [main]
    arunthirupathi committed Jul 21, 2016
Commits on Jul 20, 2016
  1. Rest Server Port is not serialized correctly

    1) While running ./gradlew clean build, exits the process and the
    build fails in the middle.
    2) When Voldemort server rest validation fails it exits the process.
    3) Cluster does not serialize the rest port correctly, which caused the
    rest port validation to fail.
    4) Before auto node detection the tests were using in memory
    cluster instead of the cluster in the metadata. Now both tests and
    product code use the same code path, which caused the tests to fail.
    1) Cluster serializes the rest port, if it is greater than zero.
    2) When rest server validation fails, it throws an exception, instead
    of exiting the process. (Searched code for System.exit and Coordinator
    Server does the same, but saving that for a different day).
    3) Node state string contains the rest port, if it is present.
    4) Let the RestServiceR2StoreTest fail with an actual error message,
    instead of boiler plate error message, which made the debugging harder.
    arunthirupathi committed Jul 20, 2016
  2. InsufficientOperationalNodes concurrent exception

    From time to time, Insufficient operational nodes can throw
    concurrent modification exception, as failures is not thread safe list.
    Modified the list to CopyOnWriteArrayList, the code path is only used
    when nodes fail, so there should not be any noticable impact to the
    arunthirupathi committed Jul 14, 2016
  3. Fix the HintedHandOff flaky tests

    1) I made some changes to metadata store in auto detect node id,
    and noticed these tests were failing. On investigation the test
    failures are caused by using static variables some of which are
    modified and based on the ordering they may or may not fail.
    2) Removed most of the static usage and made most of them as
    I still don't completely understand the test as it is quiet complicated,
    but sprinkled in some sleep to make sure that slops are registered.
    Tests passed successfully on 50 continous runs.
    arunthirupathi committed Jul 14, 2016
  4. Clean Up State after tests

    Problems :
    1) Voldemort Servers are stopped using the server.stop that does
    not delete the home directory.
    2) Store files are created in the /tmp directory which are left
    behind after the tests.
    Solution :
    1) Use ServerTestUtils.stopServer which deletes the home directory
    2) Use ServerTestUtils.createTempFile which sets deleteFileOnExit
    which deletes the file during the JVM exit stage.
    All changes are only in test files. No changes to the Product code so
    there is no risk for the Product code.
    arunthirupathi committed Jul 14, 2016
Commits on Jul 19, 2016
  1. VoldemortServerTest fails from time to time

    BouncyCastleProvider changes the state of the JVM. When it was
    in 2 different tests, the order was not predictable and if it
    was executed in different order it failed.
    Made one new test with right ordering to fix the test.
    arunthirupathi committed Jul 12, 2016
  2. Few more debug fixes

    Added some debug logging, to trace the socket destination for
    disconnected sockets.
    Logged the error on the clientTrafficVeriifer, instead of the
    printStackTrace which gets lost.
    arunthirupathi committed with arunthirupathi Jul 11, 2016
  3. Update Node Id and Cluster for Node Detection

    1) When Node Id is updated, both Metadata and VoldemortServer
    is updated. Previously, node id update, only updated metdata. There
    could be still edge cases with updating node id, as node Id is used
    at far too many places and cached at few of them. But client read and
    write is expected to work. (Will update the replace node test to
    verify the same).
    2) When node id detection is enabled, updating the cluster.xml
    will update the node Id. Voldemort as prior will accept new
    cluster update, even if it changed the client and admin port.
    After updating it will error out, though the cluster will be
    in inconsistent state at this point. The behavior is same as prior
    except for after completing, it errors out and there by notifying
    the admin. Pre-check and validation is quiet difficult, because
    of cyclic dependencies.
    Added integration tests for the above 2.
    arunthirupathi committed Jul 11, 2016
  4. Validation and Generate Script Utility added for testing purpose

    The GenereateScript is not secure  and it depends on the user
    to give sanitized input and reasonable script. For the same reason
    it just generates the output file, which must be manually reviewed
    before executing the script. GenerateScript is more of use it at
    your own risk.
    ValidateNodeIdCLI helps to validate the auto detect node Id, before
    removing the node Id from all the configs.
    ValidateNodeIdCLI when combined witht he GenerateScript, can be used
    to validate the result of entire cluster, safely before removing
    all the node ids.
    arunthirupathi committed Jul 9, 2016
  5. Detect and Validate node Ids

       Voldemort server takes node id as configuration parameters.
    It relies on the node id to identify its role in the cluster.
    But most production deployments has only one voldemort server
    per host in a cluster. Under these conditions the deduction
    of node ids can be automated. In most of the typical production
    deployments, only node id changes across the voldemort server
    configuration. This causes configuration duplication and difficult
    to manage.
    Fix :
    Node auto detection can be enabled ( disabled by default) by
    the property
    When enabled host names in cluster.xml will be matched against one of the
    FQDN ( InetAddress.getLocalHost().getCanonicalHostName() ).
    Validation can also be enabled by the property . Note that when auto detection is enabled,
    validatiion is always run. So enable the validation config, only
    if you want to run the validation, but not auto detection.
    The Implementation to match is also customizable, but windows and
    other operating systems are not considered, but it should work.
    It is customizable only for the purpose of writing tests.
    Not much work will be required to support those use cases though.
       Auto detection and validation both will fail when more than one
    node is hosted on the same machine. In such cases, both the parameters
    should be left in the default disabled state.
    Tests will follow in the next commit.
    arunthirupathi committed Jul 8, 2016
Commits on Jul 15, 2016
Commits on Jul 12, 2016
  1. Fix the log message when Fetch is disabled

    When Fetch is disabled the log message is confusing to the user.
    arunthirupathi committed Jul 1, 2016
  2. GetAll support for Quotas

    1) GetAll support for Quotas is added.
    2) All admin clients use Node based store, so that they get
    consistent reesults.
    3) Tests for the existing and new functionality.
    arunthirupathi committed Jul 1, 2016
Commits on Jun 28, 2016
  1. Utility method for retrieving a storeDefinition

    1) Two new methods for retrieving a store from random node
    or from a particular node.
    2) Enhanced the unit tests to test this new method and
    made the failure node at random to increase the effectiveness
    of the test case.
    3) Fixed the executorService shutdown in teardown.
    arunthirupathi committed Jun 28, 2016
  2. Fetch Single Store only for BnP Store creation

    BnP supports option for querying only the current store.
    By default it queries full stores.xml as this option requires
    some server side changes that will go in 1.10.18
    But once the server side changes are deployed, using this on
    a cluster with large number of stores will speed up the
    arunthirupathi committed Jun 28, 2016
  3. Parallel Operation support for AddStore and SetQuota

    1) The network operation on AddStore and setQuota can be parallelized
    by providing the ExecutorService.
    2) By default they are done on the caller thread, if no executorService
    is provided.
    3) refactored verifyOrAddStore into smaller methods and made it more
    4) Added tests for the parallel/executorService support.
    5) Added Utility method in QuotaUtils for converting to byte array.
    arunthirupathi committed with arunthirupathi Jun 28, 2016
  4. AdminClientPool, to easily pool AdminClient

    AdminClientPool is added for managing the pools of AdminClient.
    AdminClient unlike StoreClient can't be used across Cluster
    modifications. So previously AdminClient needs to be created
    every time. this was costly as the connections need to be re-established
    every time.
    AdminClientPool solves this problem by discarding AdminClient if cluster
    is modified.
    AdminClientPool still does not solve the problem of failing operation
    during cluster modification. But it will work correctly after the
    cluster is modified.
    arunthirupathi committed Jun 27, 2016