Permalink
Switch branches/tags
vector-clock-fixes v1.4.0 release-1.10.23-cutoff release-1.10.22-cutoff release-1.10.21-cutoff release-1.10.20-cutoff release-1.10.19.1-cutoff release-1.10.19-cutoff release-1.10.18-cutoff release-1.10.17-cutoff release-1.10.16-cutoff release-1.10.15-cutoff release-1.10.14-cutoff release-1.10.13-cutoff release-1.10.12-cutoff release-1.10.11-cutoff release-1.10.10-cutoff release-1.10.9-cutoff release-1.10.8-cutoff release-1.10.7-cutoff release-1.10.6-cutoff release-1.10.5-cutoff release-1.10.4-cutoff release-1.10.3-cutoff release-1.10.2-cutoff release-1.10.1-cutoff release-1.10.0-cutoff release-1.9.22-cutoff release-1.9.21-cutoff release-1.9.20-cutoff release-1.9.19-cutoff release-1.9.18-cutoff release-1.9.17-cutoff release-1.9.16-cutoff release-1.9.15-cutoff release-1.9.14-cutoff release-1.9.13-cutoff release-1.9.12-cutoff release-1.9.11-cutoff release-1.9.10-cutoff release-1.9.9-cutoff release-1.9.8-cutoff release-1.9.7-cutoff release-1.9.6-cutoff release-1.9.5-cutoff release-1.9.4-cutoff release-1.9.3-cutoff release-1.9.2-cutoff release-1.9.1-cutoff release-1.9.0-cutoff release-1.8.16-cutoff release-1.8.15-cutoff release-1.8.14-cutoff release-1.8.13-cutoff release-1.8.12-cutoff release-1.8.11-cutoff release-1.8.10-cutoff release-1.8.9-cutoff release-1.8.8-cutoff release-1.8.5-cutoff release-1.8.4-cutoff release-1.8.3-cutoff release-1.8.1-cutoff release-1.7.3-cutoff release-1.7.2-cutoff release-1.7.1-cutoff release-1.7.0-cutoff release-1.6.8-cutoff release-1.6.6-cutoff release-1.6.4 release-1.6.4-cutoff release-1.6.3-cutoff release-1.6.2-cutoff release-1.6.1-cutoff release-1.6.0-cutoff release-1.5.9-cutoff release-1.5.8-cutoff release-1.5.7-cutoff release-1.5.4-cutoff release-1.5.2-cutoff release-1.5.1-cutoff release-1.3.0-cutoff before-replicatype-was-removed before-donorbased-was-removed
Nothing to show
Commits on Jul 20, 2016
  1. InsufficientOperationalNodes concurrent exception

    From time to time, Insufficient operational nodes can throw
    concurrent modification exception, as failures is not thread safe list.
    
    Modified the list to CopyOnWriteArrayList, the code path is only used
    when nodes fail, so there should not be any noticable impact to the
    performance.
    arunthirupathi committed Jul 14, 2016
  2. Fix the HintedHandOff flaky tests

    1) I made some changes to metadata store in auto detect node id,
    and noticed these tests were failing. On investigation the test
    failures are caused by using static variables some of which are
    modified and based on the ordering they may or may not fail.
    
    2) Removed most of the static usage and made most of them as
    parameters.
    
    I still don't completely understand the test as it is quiet complicated,
    but sprinkled in some sleep to make sure that slops are registered.
    
    Tests passed successfully on 50 continous runs.
    arunthirupathi committed Jul 14, 2016
  3. Clean Up State after tests

    Problems :
    1) Voldemort Servers are stopped using the server.stop that does
    not delete the home directory.
    2) Store files are created in the /tmp directory which are left
    behind after the tests.
    
    Solution :
    1) Use ServerTestUtils.stopServer which deletes the home directory
    2) Use ServerTestUtils.createTempFile which sets deleteFileOnExit
    which deletes the file during the JVM exit stage.
    
    All changes are only in test files. No changes to the Product code so
    there is no risk for the Product code.
    arunthirupathi committed Jul 14, 2016
Commits on Jul 19, 2016
  1. VoldemortServerTest fails from time to time

    BouncyCastleProvider changes the state of the JVM. When it was
    in 2 different tests, the order was not predictable and if it
    was executed in different order it failed.
    
    Made one new test with right ordering to fix the test.
    arunthirupathi committed Jul 12, 2016
  2. Few more debug fixes

    Added some debug logging, to trace the socket destination for
    disconnected sockets.
    
    Logged the error on the clientTrafficVeriifer, instead of the
    printStackTrace which gets lost.
    arunthirupathi committed with arunthirupathi Jul 11, 2016
  3. Update Node Id and Cluster for Node Detection

    1) When Node Id is updated, both Metadata and VoldemortServer
    is updated. Previously, node id update, only updated metdata. There
    could be still edge cases with updating node id, as node Id is used
    at far too many places and cached at few of them. But client read and
    write is expected to work. (Will update the replace node test to
    verify the same).
    2) When node id detection is enabled, updating the cluster.xml
    will update the node Id. Voldemort as prior will accept new
    cluster update, even if it changed the client and admin port.
    After updating it will error out, though the cluster will be
    in inconsistent state at this point. The behavior is same as prior
    except for after completing, it errors out and there by notifying
    the admin. Pre-check and validation is quiet difficult, because
    of cyclic dependencies.
    
    Added integration tests for the above 2.
    arunthirupathi committed Jul 11, 2016
  4. Validation and Generate Script Utility added for testing purpose

    The GenereateScript is not secure  and it depends on the user
    to give sanitized input and reasonable script. For the same reason
    it just generates the output file, which must be manually reviewed
    before executing the script. GenerateScript is more of use it at
    your own risk.
    
    ValidateNodeIdCLI helps to validate the auto detect node Id, before
    removing the node Id from all the configs.
    
    ValidateNodeIdCLI when combined witht he GenerateScript, can be used
    to validate the result of entire cluster, safely before removing
    all the node ids.
    arunthirupathi committed Jul 9, 2016
  5. Detect and Validate node Ids

    Problem:
       Voldemort server takes node id as configuration parameters.
    It relies on the node id to identify its role in the cluster.
    But most production deployments has only one voldemort server
    per host in a cluster. Under these conditions the deduction
    of node ids can be automated. In most of the typical production
    deployments, only node id changes across the voldemort server
    configuration. This causes configuration duplication and difficult
    to manage.
    
    Fix :
    Node auto detection can be enabled ( disabled by default) by
    the property enable.node.id.detection
    
    When enabled host names in cluster.xml will be matched against one of the
    FQDN ( InetAddress.getLocalHost().getCanonicalHostName() ).
    
    Validation can also be enabled by the property
    validate.node.id . Note that when auto detection is enabled,
    validatiion is always run. So enable the validation config, only
    if you want to run the validation, but not auto detection.
    
    The Implementation to match is also customizable, but windows and
    other operating systems are not considered, but it should work.
    It is customizable only for the purpose of writing tests.
    
    Not much work will be required to support those use cases though.
    
    Note:
       Auto detection and validation both will fail when more than one
    node is hosted on the same machine. In such cases, both the parameters
    should be left in the default disabled state.
    
    Tests will follow in the next commit.
    arunthirupathi committed Jul 8, 2016
Commits on Jul 15, 2016
Commits on Jul 12, 2016
  1. Fix the log message when Fetch is disabled

    When Fetch is disabled the log message is confusing to the user.
    arunthirupathi committed Jul 1, 2016
  2. GetAll support for Quotas

    1) GetAll support for Quotas is added.
    2) All admin clients use Node based store, so that they get
    consistent reesults.
    3) Tests for the existing and new functionality.
    arunthirupathi committed Jul 1, 2016
Commits on Jun 28, 2016
  1. Utility method for retrieving a storeDefinition

    1) Two new methods for retrieving a store from random node
    or from a particular node.
    
    2) Enhanced the unit tests to test this new method and
    made the failure node at random to increase the effectiveness
    of the test case.
    
    3) Fixed the executorService shutdown in teardown.
    arunthirupathi committed Jun 28, 2016
  2. Fetch Single Store only for BnP Store creation

    BnP supports option for querying only the current store.
    By default it queries full stores.xml as this option requires
    some server side changes that will go in 1.10.18
    
    But once the server side changes are deployed, using this on
    a cluster with large number of stores will speed up the
    pre-processing.
    arunthirupathi committed Jun 28, 2016
  3. Parallel Operation support for AddStore and SetQuota

    1) The network operation on AddStore and setQuota can be parallelized
    by providing the ExecutorService.
    
    2) By default they are done on the caller thread, if no executorService
    is provided.
    
    3) refactored verifyOrAddStore into smaller methods and made it more
    manageable.
    
    4) Added tests for the parallel/executorService support.
    
    5) Added Utility method in QuotaUtils for converting to byte array.
    arunthirupathi committed with arunthirupathi Jun 28, 2016
  4. AdminClientPool, to easily pool AdminClient

    AdminClientPool is added for managing the pools of AdminClient.
    
    AdminClient unlike StoreClient can't be used across Cluster
    modifications. So previously AdminClient needs to be created
    every time. this was costly as the connections need to be re-established
    every time.
    
    AdminClientPool solves this problem by discarding AdminClient if cluster
    is modified.
    
    AdminClientPool still does not solve the problem of failing operation
    during cluster modification. But it will work correctly after the
    cluster is modified.
    arunthirupathi committed Jun 27, 2016
Commits on Jun 27, 2016
  1. Add a check in the AdminClient's updateRemoteStoreDefList to verify t…

    …hat breaking changes to stores do not get pushed
    
    This way when a store is live, we cannot change (for example) the keySerializer or valueSerializer
    mattwisein committed with arunthirupathi Jun 23, 2016
  2. Improve the verify or Add store for RO stores

    Problem :
         Adding RO store fetches all stores from each node. For some
    voldemort clusters, when they have lots of stores, this takes a long
    amount of time to retrieve all the stores, especially when creating
    stores across data centers.
    
    Fix :
        Rely on the ClientConfig fetch all stores xml property to see if the
    server supports retrieving single store XML. Prior voldemort servers
    does not throw an unique exception when a store is missing. So this fix
    will require the server side change as well to work correctly.
    arunthirupathi committed with arunthirupathi Jun 24, 2016
Commits on Jun 24, 2016
  1. Add method to make re-use of the AdminClient easier

    When AdminClient is created newly for each operation, AdminClient needs to
    re-establish every connection to the Voldemort Server. This makes AdminClient
    operations take a longer period of time.
    
    But if AdminClient is reused across cluster modification, then AdminClient
    will cause inconsistent operations. This new method will help the caller
    to identify if the cached AdminClient is still valid and can be re-used.
    arunthirupathi committed Jun 24, 2016
Commits on Jun 23, 2016
  1. Increase the timeout of vadmin tool

    Increased the timeout of Vadmin tool to 5 seconds from 500 ms default.
    arunthirupathi committed Jun 23, 2016
Commits on Jun 14, 2016
  1. Logging Changes for BnP

    1) Currently 2 logging statements for each directory processed.
    Once build primary replicas only was introduced this is creating
    large number of logs.
    2) With this change it is converted to time or count based.
    The log will be generated either after 30 seconds or processing
    100 directories.
    3) Total time for directory processing and the empty directories
    are outputted at the end of the processing.
    
    Sample Output from the run:
    
     Processed 0 out of 540 directories.
     Processed 100 out of 540 directories.
     Processed 200 out of 540 directories.
     Processed 300 out of 540 directories.
     Processed 400 out of 540 directories.
     Processed 500 out of 540 directories.
     Total Processed directories: 540. Elapsed Time (Seconds):43
    
     Empty directories: [5, 11, 15, 17, 39, 50, 55, 58, 82, 88, 113, 117,
    119, 120, 125, 126, 127, 183, 184, 199, 203, 212, 213, 223, 232, 250,
    266, 269, 270, 288, 293, 302, 317, 318, 323, 324, 332, 337, 339, 362,
    363, 375, 381, 382, 392, 394, 403, 407, 412, 415, 420, 425, 430, 440,
    441, 448, 458, 462, 469, 472, 481, 483, 496, 500, 503, 508, 510, 512,
    517, 522, 526, 529]
    arunthirupathi committed with arunthirupathi Jun 10, 2016
  2. When store is missing, error message is not clear

    When a store is missing on the Server, Voldemort error message on
    the client used to say
    
    Failed to read metadata key:"XXX" delete config/.temp config/.version directories and restart.
    
    Now it says store XXX does not exist on node YY
    
    This will be easier to reason from the client perspective.
    arunthirupathi committed with arunthirupathi Jun 3, 2016
Commits on Jun 9, 2016
  1. Null Pointer Exception in diff message

    When comparing for store equality, if one store has null, other has not
    while trying to append the message it throws NPE.
    
    Used String.valueOf which handles null.
    arunthirupathi committed Jun 9, 2016
  2. Standardize on the QuotaTypes

    1) setQuota to try all nodes, remember the exception and throw the last
    Exception.
    
    2) get and unset quota now takes in QuotaType enum instead of the
    string.
    arunthirupathi committed Jun 9, 2016
Commits on Jun 2, 2016
  1. Modify debug info on the Client bootstrap

    When the client bootstraps, it dumps the clientConfig parameters to the
    log. Removed a deprecated parameter and added other 3 parameters which
    are useful in debugging.
    arunthirupathi committed Jun 1, 2016
  2. Reduce the Admin Timeout from Build And Push

    Reduced the Voldemort Admin timeout to 60 seconds from Build And Push
    Job. Anything greater than 60 seconds should fail.
    
    Occassionally Voldemort build and push jobs hang, when a node crashes in
    a bad state. This should help recover those cases.
    arunthirupathi committed with arunthirupathi Jun 1, 2016
  3. Add idle connection timeout for Client connections

    If Voldemort client lives behind a firewall, the connections
    could be dropped by the firewall silently. The firewall also drops any
    future packets sent on the connection, which causes lot of timeout for
    low throughput voldemort clients.
    
    The fix adds the timeout to the client config, by default the idle
    connection timeout is disabled.
    arunthirupathi committed Jun 1, 2016
Commits on Jun 1, 2016
  1. Don't use Properties(properties) constructor

    Properties(properties) constructor has a different behavior than the one
    intended.
    
    http://stackoverflow.com/questions/2004833/how-to-merge-two-java-util-properties-objects
    
    >>> copy/pasted text <<<
    
    However, if you treat it like a Map, you need to be very careful with
    this:
    
    new Properties(defaultProperties);
    This often catches people out, because it looks like a copy constructor,
    but it isn't. If you use that constructor, and then call something like
    keySet() (inherited from its Hashtable superclass), you'll get an empty
    set, because the Map methods of Properties do not take account of the
    default Properties object that you passed into the constructor. The
    defaults are only recognised if you use the methods defined in
    Properties itself, such as getProperty and propertyNames, among others.
    arunthirupathi committed Jun 1, 2016
Commits on May 24, 2016
Commits on May 23, 2016
  1. Lower the node unavailable error to debug

    When async connect fails, selector reports the error and it is cached in
    memory. Next connect call will get the error if it happens within twice
    the timeout period.
    
    The logs are logged at the info level, which spams the logs when the
    server is unavailable for extended period of times. Now it is dropped
    down to debug level.
    arunthirupathi committed May 23, 2016
  2. Voldemort server cleans up HA state.

    Added code so that the Voldemort server automatically cleans
    the shared High Availability state from HDFS when appropriate.
    
    Currently, this new code runs when:
    1. An old Read-Only store-version is deleted, which usually
    happens asynchronously after a new store-version is activated.
    2. When a server transitions from OFFLINE mode to ONLINE mode.
    
    This new benahvior is disabled by default, but can be enabled via:
    
    push.ha.state.auto.cleanup=true
    
    Also added a bit of extra logging in the impacted code paths.
    FelixGV committed May 17, 2016
Commits on May 20, 2016
  1. Option for fetching single or all stores during bootstrap

    Voldemort servers older than 1.8.1 supported only fetching all stores.xml
    1.8.1 supported fetching individual stores. During bootstrap this
    property controls whether to fetch all stores.xml or only the particular
    store.
    
    Exposed the bootstrap retry time in seconds as a config option as well.
    
    Added tests for the new code.
    arunthirupathi committed with arunthirupathi May 12, 2016
  2. Client shell throws error on closing

    Admin and the Client factory uses the same suffix, which tries
    to delete the already de-registered JMX. Set the different identifier
    for admin.
    
    This is just a minor annoyance when you close the shell, it throws
    an exception. It is just restricted to voldmeort client shell. The
    metrics are not used by any one and overwriting the metrics is a non
    issue.
    arunthirupathi committed with arunthirupathi May 12, 2016
  3. Print more info When cluster metadata check fails

    Print the metadata store version on each node, when the
    cluster metadata check fails.
    arunthirupathi committed May 9, 2016