Permalink
Switch branches/tags
vector-clock-fixes v1.4.0 release-1.10.26-cutoff release-1.10.25-cutoff release-1.10.24-cutoff release-1.10.23-cutoff release-1.10.22-cutoff release-1.10.21-cutoff release-1.10.20-cutoff release-1.10.19.1-cutoff release-1.10.19-cutoff release-1.10.18-cutoff release-1.10.17-cutoff release-1.10.16-cutoff release-1.10.15-cutoff release-1.10.14-cutoff release-1.10.13-cutoff release-1.10.12-cutoff release-1.10.11-cutoff release-1.10.10-cutoff release-1.10.9-cutoff release-1.10.8-cutoff release-1.10.7-cutoff release-1.10.6-cutoff release-1.10.5-cutoff release-1.10.4-cutoff release-1.10.3-cutoff release-1.10.2-cutoff release-1.10.1-cutoff release-1.10.0-cutoff release-1.9.22-cutoff release-1.9.21-cutoff release-1.9.20-cutoff release-1.9.19-cutoff release-1.9.18-cutoff release-1.9.17-cutoff release-1.9.16-cutoff release-1.9.15-cutoff release-1.9.14-cutoff release-1.9.13-cutoff release-1.9.12-cutoff release-1.9.11-cutoff release-1.9.10-cutoff release-1.9.9-cutoff release-1.9.8-cutoff release-1.9.7-cutoff release-1.9.6-cutoff release-1.9.5-cutoff release-1.9.4-cutoff release-1.9.3-cutoff release-1.9.2-cutoff release-1.9.1-cutoff release-1.9.0-cutoff release-1.8.16-cutoff release-1.8.15-cutoff release-1.8.14-cutoff release-1.8.13-cutoff release-1.8.12-cutoff release-1.8.11-cutoff release-1.8.10-cutoff release-1.8.9-cutoff release-1.8.8-cutoff release-1.8.5-cutoff release-1.8.4-cutoff release-1.8.3-cutoff release-1.8.1-cutoff release-1.7.3-cutoff release-1.7.2-cutoff release-1.7.1-cutoff release-1.7.0-cutoff release-1.6.8-cutoff release-1.6.6-cutoff release-1.6.4 release-1.6.4-cutoff release-1.6.3-cutoff release-1.6.2-cutoff release-1.6.1-cutoff release-1.6.0-cutoff release-1.5.9-cutoff release-1.5.8-cutoff release-1.5.7-cutoff release-1.5.4-cutoff release-1.5.2-cutoff release-1.5.1-cutoff release-1.3.0-cutoff before-replicatype-was-removed before-donorbased-was-removed
Nothing to show
Commits on Aug 11, 2016
  1. Revert "Provide chunk size suggestion for BnP jobs with chunk overflo…

    squarY committed Aug 11, 2016
    …w exceptions and fix num chunks algorith to round up"
    
    This reverts commit fdd2ca9.
Commits on Aug 10, 2016
  1. Releasing Voldemort 1.10.21

    squarY committed Aug 10, 2016
    Fix release notes.
  2. Merge pull request #436 from squarY/timeoutfix

    squarY committed Aug 10, 2016
    Fix: Extend the timeout of admin request
  3. Fix: extend admin request time out from 1min to 5min.

    squarY committed Aug 10, 2016
    Add more logs when handing failed fetch request.
    
    Fix issues based on RB.
Commits on Aug 5, 2016
  1. Provide chunk size suggestion for BnP jobs with chunk overflow except…

    FelixGV committed with mattwisein Jun 10, 2016
    …ions and fix num chunks algorith to round up
Commits on Jul 28, 2016
  1. RO Store Create floods the Log with error messages

    arunthirupathi committed Jul 28, 2016
    Creating a RO store queries for an existing store, which
    fails with StoreNotFoundException. This exception is logged with call
    stack, which floods the logs on every store creation.
    
    This may trick the alerting system into treating this as error.
    Not logging a call stack and reducing the log to info, when such
    exceptions are logged.
Commits on Jul 25, 2016
Commits on Jul 21, 2016
  1. Log verifyOrAddStore time in the logs.

    arunthirupathi committed Jul 21, 2016
    Currently the time spent in verify or Add Store is not tracked.
    This change introduces a log line to track this time.
    
    Following log message will be added, for the calls.
    
    [18:36:23,113 voldemort.client.protocol.admin.AdminClient] INFO
    verifyOrAddStore() BootStrapUrls: [tcp://localhost:48150] Store:
    abc-xyz-read-only Verification Time: 10 ms, Creation Time: 39 ms [main]
Commits on Jul 20, 2016
  1. Rest Server Port is not serialized correctly

    arunthirupathi committed Jul 20, 2016
    Problems
    1) While running ./gradlew clean build, exits the process and the
    build fails in the middle.
    2) When Voldemort server rest validation fails it exits the process.
    3) Cluster does not serialize the rest port correctly, which caused the
    rest port validation to fail.
    4) Before auto node detection the tests were using in memory
    cluster instead of the cluster in the metadata. Now both tests and
    product code use the same code path, which caused the tests to fail.
    
    Fix:
    1) Cluster serializes the rest port, if it is greater than zero.
    2) When rest server validation fails, it throws an exception, instead
    of exiting the process. (Searched code for System.exit and Coordinator
    Server does the same, but saving that for a different day).
    3) Node state string contains the rest port, if it is present.
    4) Let the RestServiceR2StoreTest fail with an actual error message,
    instead of boiler plate error message, which made the debugging harder.
  2. InsufficientOperationalNodes concurrent exception

    arunthirupathi committed Jul 14, 2016
    From time to time, Insufficient operational nodes can throw
    concurrent modification exception, as failures is not thread safe list.
    
    Modified the list to CopyOnWriteArrayList, the code path is only used
    when nodes fail, so there should not be any noticable impact to the
    performance.
  3. Fix the HintedHandOff flaky tests

    arunthirupathi committed Jul 14, 2016
    1) I made some changes to metadata store in auto detect node id,
    and noticed these tests were failing. On investigation the test
    failures are caused by using static variables some of which are
    modified and based on the ordering they may or may not fail.
    
    2) Removed most of the static usage and made most of them as
    parameters.
    
    I still don't completely understand the test as it is quiet complicated,
    but sprinkled in some sleep to make sure that slops are registered.
    
    Tests passed successfully on 50 continous runs.
  4. Clean Up State after tests

    arunthirupathi committed Jul 14, 2016
    Problems :
    1) Voldemort Servers are stopped using the server.stop that does
    not delete the home directory.
    2) Store files are created in the /tmp directory which are left
    behind after the tests.
    
    Solution :
    1) Use ServerTestUtils.stopServer which deletes the home directory
    2) Use ServerTestUtils.createTempFile which sets deleteFileOnExit
    which deletes the file during the JVM exit stage.
    
    All changes are only in test files. No changes to the Product code so
    there is no risk for the Product code.
Commits on Jul 19, 2016
  1. VoldemortServerTest fails from time to time

    arunthirupathi committed Jul 12, 2016
    BouncyCastleProvider changes the state of the JVM. When it was
    in 2 different tests, the order was not predictable and if it
    was executed in different order it failed.
    
    Made one new test with right ordering to fix the test.
  2. Few more debug fixes

    arunthirupathi committed Jul 11, 2016
    Added some debug logging, to trace the socket destination for
    disconnected sockets.
    
    Logged the error on the clientTrafficVeriifer, instead of the
    printStackTrace which gets lost.
  3. Update Node Id and Cluster for Node Detection

    arunthirupathi committed Jul 11, 2016
    1) When Node Id is updated, both Metadata and VoldemortServer
    is updated. Previously, node id update, only updated metdata. There
    could be still edge cases with updating node id, as node Id is used
    at far too many places and cached at few of them. But client read and
    write is expected to work. (Will update the replace node test to
    verify the same).
    2) When node id detection is enabled, updating the cluster.xml
    will update the node Id. Voldemort as prior will accept new
    cluster update, even if it changed the client and admin port.
    After updating it will error out, though the cluster will be
    in inconsistent state at this point. The behavior is same as prior
    except for after completing, it errors out and there by notifying
    the admin. Pre-check and validation is quiet difficult, because
    of cyclic dependencies.
    
    Added integration tests for the above 2.
  4. Validation and Generate Script Utility added for testing purpose

    arunthirupathi committed Jul 9, 2016
    The GenereateScript is not secure  and it depends on the user
    to give sanitized input and reasonable script. For the same reason
    it just generates the output file, which must be manually reviewed
    before executing the script. GenerateScript is more of use it at
    your own risk.
    
    ValidateNodeIdCLI helps to validate the auto detect node Id, before
    removing the node Id from all the configs.
    
    ValidateNodeIdCLI when combined witht he GenerateScript, can be used
    to validate the result of entire cluster, safely before removing
    all the node ids.
  5. Detect and Validate node Ids

    arunthirupathi committed Jul 8, 2016
    Problem:
       Voldemort server takes node id as configuration parameters.
    It relies on the node id to identify its role in the cluster.
    But most production deployments has only one voldemort server
    per host in a cluster. Under these conditions the deduction
    of node ids can be automated. In most of the typical production
    deployments, only node id changes across the voldemort server
    configuration. This causes configuration duplication and difficult
    to manage.
    
    Fix :
    Node auto detection can be enabled ( disabled by default) by
    the property enable.node.id.detection
    
    When enabled host names in cluster.xml will be matched against one of the
    FQDN ( InetAddress.getLocalHost().getCanonicalHostName() ).
    
    Validation can also be enabled by the property
    validate.node.id . Note that when auto detection is enabled,
    validatiion is always run. So enable the validation config, only
    if you want to run the validation, but not auto detection.
    
    The Implementation to match is also customizable, but windows and
    other operating systems are not considered, but it should work.
    It is customizable only for the purpose of writing tests.
    
    Not much work will be required to support those use cases though.
    
    Note:
       Auto detection and validation both will fail when more than one
    node is hosted on the same machine. In such cases, both the parameters
    should be left in the default disabled state.
    
    Tests will follow in the next commit.
Commits on Jul 15, 2016
Commits on Jul 12, 2016
  1. Fix the log message when Fetch is disabled

    arunthirupathi committed Jul 1, 2016
    When Fetch is disabled the log message is confusing to the user.
  2. GetAll support for Quotas

    arunthirupathi committed Jul 1, 2016
    1) GetAll support for Quotas is added.
    2) All admin clients use Node based store, so that they get
    consistent reesults.
    3) Tests for the existing and new functionality.
Commits on Jun 28, 2016
  1. Utility method for retrieving a storeDefinition

    arunthirupathi committed Jun 28, 2016
    1) Two new methods for retrieving a store from random node
    or from a particular node.
    
    2) Enhanced the unit tests to test this new method and
    made the failure node at random to increase the effectiveness
    of the test case.
    
    3) Fixed the executorService shutdown in teardown.
  2. Fetch Single Store only for BnP Store creation

    arunthirupathi committed Jun 28, 2016
    BnP supports option for querying only the current store.
    By default it queries full stores.xml as this option requires
    some server side changes that will go in 1.10.18
    
    But once the server side changes are deployed, using this on
    a cluster with large number of stores will speed up the
    pre-processing.
  3. Parallel Operation support for AddStore and SetQuota

    arunthirupathi committed Jun 28, 2016
    1) The network operation on AddStore and setQuota can be parallelized
    by providing the ExecutorService.
    
    2) By default they are done on the caller thread, if no executorService
    is provided.
    
    3) refactored verifyOrAddStore into smaller methods and made it more
    manageable.
    
    4) Added tests for the parallel/executorService support.
    
    5) Added Utility method in QuotaUtils for converting to byte array.
  4. AdminClientPool, to easily pool AdminClient

    arunthirupathi committed Jun 27, 2016
    AdminClientPool is added for managing the pools of AdminClient.
    
    AdminClient unlike StoreClient can't be used across Cluster
    modifications. So previously AdminClient needs to be created
    every time. this was costly as the connections need to be re-established
    every time.
    
    AdminClientPool solves this problem by discarding AdminClient if cluster
    is modified.
    
    AdminClientPool still does not solve the problem of failing operation
    during cluster modification. But it will work correctly after the
    cluster is modified.
Commits on Jun 27, 2016
  1. Add a check in the AdminClient's updateRemoteStoreDefList to verify t…

    mattwisein committed with arunthirupathi Jun 23, 2016
    …hat breaking changes to stores do not get pushed
    
    This way when a store is live, we cannot change (for example) the keySerializer or valueSerializer
  2. Improve the verify or Add store for RO stores

    arunthirupathi committed Jun 24, 2016
    Problem :
         Adding RO store fetches all stores from each node. For some
    voldemort clusters, when they have lots of stores, this takes a long
    amount of time to retrieve all the stores, especially when creating
    stores across data centers.
    
    Fix :
        Rely on the ClientConfig fetch all stores xml property to see if the
    server supports retrieving single store XML. Prior voldemort servers
    does not throw an unique exception when a store is missing. So this fix
    will require the server side change as well to work correctly.
Commits on Jun 24, 2016
  1. Add method to make re-use of the AdminClient easier

    arunthirupathi committed Jun 24, 2016
    When AdminClient is created newly for each operation, AdminClient needs to
    re-establish every connection to the Voldemort Server. This makes AdminClient
    operations take a longer period of time.
    
    But if AdminClient is reused across cluster modification, then AdminClient
    will cause inconsistent operations. This new method will help the caller
    to identify if the cached AdminClient is still valid and can be re-used.
Commits on Jun 23, 2016
  1. Increase the timeout of vadmin tool

    arunthirupathi committed Jun 23, 2016
    Increased the timeout of Vadmin tool to 5 seconds from 500 ms default.
Commits on Jun 14, 2016
  1. Logging Changes for BnP

    arunthirupathi committed Jun 10, 2016
    1) Currently 2 logging statements for each directory processed.
    Once build primary replicas only was introduced this is creating
    large number of logs.
    2) With this change it is converted to time or count based.
    The log will be generated either after 30 seconds or processing
    100 directories.
    3) Total time for directory processing and the empty directories
    are outputted at the end of the processing.
    
    Sample Output from the run:
    
     Processed 0 out of 540 directories.
     Processed 100 out of 540 directories.
     Processed 200 out of 540 directories.
     Processed 300 out of 540 directories.
     Processed 400 out of 540 directories.
     Processed 500 out of 540 directories.
     Total Processed directories: 540. Elapsed Time (Seconds):43
    
     Empty directories: [5, 11, 15, 17, 39, 50, 55, 58, 82, 88, 113, 117,
    119, 120, 125, 126, 127, 183, 184, 199, 203, 212, 213, 223, 232, 250,
    266, 269, 270, 288, 293, 302, 317, 318, 323, 324, 332, 337, 339, 362,
    363, 375, 381, 382, 392, 394, 403, 407, 412, 415, 420, 425, 430, 440,
    441, 448, 458, 462, 469, 472, 481, 483, 496, 500, 503, 508, 510, 512,
    517, 522, 526, 529]
  2. When store is missing, error message is not clear

    arunthirupathi committed Jun 3, 2016
    When a store is missing on the Server, Voldemort error message on
    the client used to say
    
    Failed to read metadata key:"XXX" delete config/.temp config/.version directories and restart.
    
    Now it says store XXX does not exist on node YY
    
    This will be easier to reason from the client perspective.
Commits on Jun 9, 2016
  1. Null Pointer Exception in diff message

    arunthirupathi committed Jun 9, 2016
    When comparing for store equality, if one store has null, other has not
    while trying to append the message it throws NPE.
    
    Used String.valueOf which handles null.
  2. Standardize on the QuotaTypes

    arunthirupathi committed Jun 9, 2016
    1) setQuota to try all nodes, remember the exception and throw the last
    Exception.
    
    2) get and unset quota now takes in QuotaType enum instead of the
    string.
Commits on Jun 2, 2016