Permalink
Switch branches/tags
vector-clock-fixes v1.4.0 release-1.10.25-cutoff release-1.10.24-cutoff release-1.10.23-cutoff release-1.10.22-cutoff release-1.10.21-cutoff release-1.10.20-cutoff release-1.10.19.1-cutoff release-1.10.19-cutoff release-1.10.18-cutoff release-1.10.17-cutoff release-1.10.16-cutoff release-1.10.15-cutoff release-1.10.14-cutoff release-1.10.13-cutoff release-1.10.12-cutoff release-1.10.11-cutoff release-1.10.10-cutoff release-1.10.9-cutoff release-1.10.8-cutoff release-1.10.7-cutoff release-1.10.6-cutoff release-1.10.5-cutoff release-1.10.4-cutoff release-1.10.3-cutoff release-1.10.2-cutoff release-1.10.1-cutoff release-1.10.0-cutoff release-1.9.22-cutoff release-1.9.21-cutoff release-1.9.20-cutoff release-1.9.19-cutoff release-1.9.18-cutoff release-1.9.17-cutoff release-1.9.16-cutoff release-1.9.15-cutoff release-1.9.14-cutoff release-1.9.13-cutoff release-1.9.12-cutoff release-1.9.11-cutoff release-1.9.10-cutoff release-1.9.9-cutoff release-1.9.8-cutoff release-1.9.7-cutoff release-1.9.6-cutoff release-1.9.5-cutoff release-1.9.4-cutoff release-1.9.3-cutoff release-1.9.2-cutoff release-1.9.1-cutoff release-1.9.0-cutoff release-1.8.16-cutoff release-1.8.15-cutoff release-1.8.14-cutoff release-1.8.13-cutoff release-1.8.12-cutoff release-1.8.11-cutoff release-1.8.10-cutoff release-1.8.9-cutoff release-1.8.8-cutoff release-1.8.5-cutoff release-1.8.4-cutoff release-1.8.3-cutoff release-1.8.1-cutoff release-1.7.3-cutoff release-1.7.2-cutoff release-1.7.1-cutoff release-1.7.0-cutoff release-1.6.8-cutoff release-1.6.6-cutoff release-1.6.4 release-1.6.4-cutoff release-1.6.3-cutoff release-1.6.2-cutoff release-1.6.1-cutoff release-1.6.0-cutoff release-1.5.9-cutoff release-1.5.8-cutoff release-1.5.7-cutoff release-1.5.4-cutoff release-1.5.2-cutoff release-1.5.1-cutoff release-1.3.0-cutoff before-replicatype-was-removed before-donorbased-was-removed
Nothing to show
Commits on May 23, 2016
  1. Voldemort server cleans up HA state.

    Added code so that the Voldemort server automatically cleans
    the shared High Availability state from HDFS when appropriate.
    
    Currently, this new code runs when:
    1. An old Read-Only store-version is deleted, which usually
    happens asynchronously after a new store-version is activated.
    2. When a server transitions from OFFLINE mode to ONLINE mode.
    
    This new benahvior is disabled by default, but can be enabled via:
    
    push.ha.state.auto.cleanup=true
    
    Also added a bit of extra logging in the impacted code paths.
    FelixGV committed May 17, 2016
Commits on May 20, 2016
  1. Option for fetching single or all stores during bootstrap

    Voldemort servers older than 1.8.1 supported only fetching all stores.xml
    1.8.1 supported fetching individual stores. During bootstrap this
    property controls whether to fetch all stores.xml or only the particular
    store.
    
    Exposed the bootstrap retry time in seconds as a config option as well.
    
    Added tests for the new code.
    arunthirupathi committed with arunthirupathi May 12, 2016
  2. Client shell throws error on closing

    Admin and the Client factory uses the same suffix, which tries
    to delete the already de-registered JMX. Set the different identifier
    for admin.
    
    This is just a minor annoyance when you close the shell, it throws
    an exception. It is just restricted to voldmeort client shell. The
    metrics are not used by any one and overwriting the metrics is a non
    issue.
    arunthirupathi committed with arunthirupathi May 12, 2016
  3. Print more info When cluster metadata check fails

    Print the metadata store version on each node, when the
    cluster metadata check fails.
    arunthirupathi committed May 9, 2016
  4. Route System Store to Same Zone

    System Store queries are not sent to the same zone.
    Hacked the PipelineRoutedStore to force the system stores with
    all routing to prefer the same zone routing first.
    arunthirupathi committed May 9, 2016
  5. Set Metadata version node by node

    Previously Metadata version is set using put operation for
    entire cluster. As soon as one node succeeds, any failure
    is silently ignored. This caused the cluster to fall out of sync
    with clients.
    
    Now the Metadata version is set node by node and errors are reported
    so the operator knows of any issues.
    
    Fixed the Admin Command line tools to use the New APIs.
    arunthirupathi committed May 9, 2016
  6. Last commit regressed the Hadoop Fetcher

    There were tests failing, and on investigation it turned out that
    negation was missed.
    
    My preference was to write == false instead of negation, but that is C++
    style.
    arunthirupathi committed May 20, 2016
Commits on May 17, 2016
  1. Add retry when calling fs.isFile.

    Change retryIsFile to isFile and add attempt information in error log.
    squarY committed May 16, 2016
Commits on May 8, 2016
  1. Upgrade to BDB JE 5.0.104

    This is rebased code for the Pull request
    #247
    
    BDB JE 5.0.104 is available as maven artifact and this removes the
    last checked in jar from Voldemort. Once this is done, Voldemort
    new builds can be published to Maven automatically.
    arunthirupathi committed May 8, 2016
Commits on May 6, 2016
Commits on May 5, 2016
  1. Removed 'num.chunks' config parameter from BnP.

    Setting this parameter introduced a subtle failure mode when there were
    too few records in the store. It's not worth fixing the other bug since
    'num.chunks' is automatically calculated anyway, based on input data size.
    
    This change removes a bit of rope for users to hang themselves with (:
    FelixGV committed May 5, 2016
  2. Renamed AdminStoreSwapper#swapStoreData() to #fetchAndSwapStoreData()

    Also clarified one of the logs so that BnP announces that it's about to fetch
    (previously, it mentioned "swap" instead).
    FelixGV committed May 5, 2016
Commits on Apr 28, 2016
  1. Use empty stores for Replace Node CLI test

    This should have caught a bug, which is already fixed
    in the last commit.
    
    There are still couple of bugs in the Offline Mode.
    
    AdminClient storeOps uses socketPort, which will be down in the
    offline mode. AdminPort supports full client operations and hence
    it should have used the AdminPort.
    
    AdminClient on the voldemort server uses cluster for bootstrapping
    which uses the client port again. This is problematic when the node
    is node 0. It should use the Admin Port for bootstrapping.
    
    Those bugs are in backlog will fix them later.
    arunthirupathi committed Apr 28, 2016
Commits on Apr 25, 2016
  1. Metadata check to include quota check

    1) Reliably set Quota on all nodes.
    2) Meta check will check for quota on all nodes.
    3) Meta check will ignore 0 length RO files
    4) Meta check will skip nodes with 0 partitions when verifying
    store can be fetched.
    5) ReplaceNodeCLI node validation and ignore all errors from failing
    node.
    6) Set Quota to report the correct node it is going to run against.
    7) When a store is created, set the default quota directly instead
    of using an admin client to write to the same node.
    8) Tests for the issue fixed above.
    
    All operability improvement for working with quota.
    arunthirupathi committed Apr 23, 2016
Commits on Apr 18, 2016
  1. ContribJar is missing in the Tar/Zip

    Some of the commands from the shell fails because of missing
    contribJar in the tar ball.
    
    Include contribJar and protobufJar from the dist directory.
    arunthirupathi committed with arunthirupathi Apr 18, 2016
  2. DataCleanup Job dynamic retention days and deletion

    Problems Fixed:
    1) If a store with data retention is deleted, it incorrectly holds
    the lock there by prevening other jobs from running.
    2) If a store's data retention is modified, it requires the cluster
    to be bounced as the retention day is read at the start and never
    altered.
    
    Problem not fixed:
     If the data cleanup job retention frequency is set for the store
    and if the frequency is modified, it would still require a cluster
    bounce.
    
    Fix:
       1) Instead of taking the store retention time, data cleanup job
    takes in the store name and metadata store and computes the retention
    time dynamically for each run.
       2) When a store can't be retrieved from the metadata store the
    cleanup job is skipped.
       3) All the code after the lock acquistion is moved inside the
    try/finally block. Previously beginBatchModification was outside
    of the lock, which threw exception and caused the lock to be not
    freed.
       4) Added unit tests for both the problems fixed.
    arunthirupathi committed with arunthirupathi Apr 18, 2016
  3. Releasing Voldemort 1.10.15

    SidW committed Apr 18, 2016
Commits on Apr 16, 2016
  1. The AdminClient's verifyOrAddStore is vulnerable to connectivity issues.

    This commit makes the following changes:
    - verifyOrAddStore() is now more resilient to various kinds of exceptions.
    - VoldemortBuildAndPushJob now logs verifyOrAddStore's exceptions.
    - ExceptionUtils.recursiveClassEquals() can now look for many exception types.
    - Added ExceptionUtilsTest.
    FelixGV committed Apr 15, 2016
Commits on Apr 9, 2016
  1. Multi module gradle build for voldemort

    Each contrib directory now becomes a separate project (although the
    no new build.gradle files have been created). This makes the import
    into Intellij better as it can setup the contrib source roots
    correctly.
    
    It also opens the door to refactoring the project to be more
    conventional in it's layout and artifact publishing.
    
    gradle 2.9 + fixing the eclipse project generation
    
    Had to rework the eclipse project generation configuration
    as it seems to disagree with the multi module project
    structure. The upgrade to gradle 2.9 fixes as issue with
    the eclipse generator where the JDK would be inserted
    twice into the .classpath file.
    
    Note that you can use the eclipse import gradle project
    feature, the only gotcha is that it defaults the output
    directory to /bin, which is means the script directory
    gets nuke'd by eclipse on rebuild. As a work around
    this config can be changed manually and then the bin dir
    restored from git if anybody prefers to use the native
    gradle support in eclipse.
    
    Have also added defaults for the test resource directory, this
    prevents IDEA from creating src/test/resources which is a little
    confusing. Note this will only happen if you use 'Create empty
    content roots' option in IDEA.
    
    Javadoc generation is disabled by default.
    
    The javadoc can be reenabled using -Pjavadoc.enabled=true
    
    The fix for zip & tar is picked from arunthirupathi@2e0e9cf
    tempredirect committed Nov 23, 2015
Commits on Mar 30, 2016
  1. Set-Metadata null Version error

    Set Metadata might result in the following error as listFiles
    can return null
    
    java.lang.NullPointerException
    
    	at
    voldemort.store.configuration.ConfigurationStorageEngine.put(ConfigurationStorageEngine.java:146)
    
    	at
    voldemort.store.configuration.ConfigurationStorageEngine.put(ConfigurationStorageEngine.java:50)
    
    	at
    voldemort.store.metadata.MetadataStore.put(MetadataStore.java:355)
    arunthirupathi committed Mar 25, 2016
  2. Releasing Voldemort 1.10.14

    SidW committed Mar 30, 2016
  3. Merge pull request #396 from gaojieliu/FileFetcher_Stats

    Expose data points in FileFetcher and create autometrics sensors for them
    gaojieliu committed Mar 30, 2016
Commits on Mar 29, 2016
  1. This change is mostly to expose more aggregated metrics for HDFS data

    pushes:
    1. totalBytesFetched : the total bytes transferred from HDFS so far;
    2. totalFetchRetries : the total fetch retry number so far;
    3. totalCheckSumFailures : the total data file checksum failures
    happened so far;
    4. totalAuthenticationFailures : the total authentication failures
    happened so far;
    5. totalFileNotFoundFailures : the total file-not-found failures
    happened so far;
    6. totalFileReadFailures : the total HDFS file read failures happened so
    far;
    7. totalQuotaExceedFailures : the total quota exceed failures happened
    so far;
    8. totalUnauthorizedStoreFailures : the total unauthorized store push
    failures happened so far;
    9. parallelFetches : the total number of active fetches right now;
    10. totalFetches : the total HDFS fetch number so far;
    11. totalIncompleteFetches : the total incomplete fetch number so far;
    12. totalDataFetchRate : the total data fetch rate right now;
    gaojieliu committed Mar 29, 2016
Commits on Mar 25, 2016
  1. Update version after updating the data

    Currently the version is updated before the data and it
    makes the client to read the wrong values.
    
    Update the Version after updating the data, so when the client
    reads they have the correct value.
    arunthirupathi committed Mar 19, 2016
Commits on Mar 24, 2016
  1. Hdfs FileSystem handles are leaked in Fetch

    Issue : After the HadoopFileSystem object is created, the validity of
    the fileSystem is verified by doing a sample operation. If the operation
    fails the Hadoop FileSystem object is leaked. This object should be
    cleaned up by the Garbage collection, but all the FileSystem objects
    are cached, so this is leaked. When voldemort server is used with
    secure webhdfs (swebhdfs) file system it leaks enough memory to kill
    the servers eventually.
    
    Previously in voldemort webhdfs file system handles were leaked.
    Apparently webhdfs file system handles are very cheap. But in SwebHdfs
    they have the security certificate embedded in them. This causes them to
    be very big.
    
    Heap Dump analysis
    WebHdfsFileSystem - 3768 Objects - 80 MB
    SWebHdfsFileSystem - 1748 Objects - 3 GB
    
    Solution :
    1)  Disable the caching for the following reason
     Hadoop FileSystem class caches the FileSystem objects based on
     the scheme , authority and UserGroupInformation.
    
     The default config was to generate new UserGroupInformation for
     each call, so the cache will be never hit. In the case where the
     FileSystem is not closed correctly, it will leak handles.
    
     But if the UserGroupInformation is re-used, it will cause the
     FileSystem object to be shared between HdfsFetcher /
     HdfsFailedFetchLock. Each Voldemort HdfsFetcher/HAFailedFetchLock
     lock closes the fileSystem object at the end, though others might
     still be using it. This causes random failures.
    
     Since it does not work in both the cases, the Caching is
     disabled. The caching should be only enabled if the
     UserGroupInformation is to be re-used and the close bug is fixed.
    
    2) Clean up the file handles on the error cases. Traced down all the
    handles and cleaned them up on the error path.
    arunthirupathi committed Mar 22, 2016
Commits on Mar 23, 2016
  1. Check server state when bring nodes back

    Fixed the issue when starting up Voldemort in offilne state, it actually
    goes online and keeps listening client requests.
    SidW committed with arunthirupathi Mar 22, 2016
Commits on Mar 18, 2016
  1. fixing a NPE thrown by the StorageEngineService on startup with views

    Result of the getCapability() implementation returning a null instead
    of throwing the required NoSuchCapabilityException.
    
    Example of the exception:
    
        Exception in thread "main" java.lang.NullPointerException
            at voldemort.server.storage.StorageService.startInner(StorageService.java:424)
            at voldemort.common.service.AbstractService.start(AbstractService.java:62)
            at voldemort.server.VoldemortServer.startInner(VoldemortServer.java:374)
            at voldemort.common.service.AbstractService.start(AbstractService.java:62)
            at voldemort.server.VoldemortServer.main(VoldemortServer.java:437)
    tempredirect committed with arunthirupathi Dec 1, 2015
Commits on Mar 12, 2016
  1. This update includes two parts:

    1. build.gradle change is to fix 'Wrong package statement' error when
    import voldemort to IntelliJ;
    2. VoldemortBuildAndPushJob.java change is to make log message clearer
    when delete the temp files in grid generated by b&p job;
    gaojieliu committed Mar 11, 2016
Commits on Mar 11, 2016
  1. Disable store creation on the BnP side

    Refactored Admin.storeMgmtOps#verifyOrAddStore. It now takes a new
    boolean argument "creationStore" to decide whether or not adds new
    stores into the cluster if they are not found.
    
    Add a boolean value eableStoreCreation in VoldemortBuildAndPushJob.
    SidW committed Mar 11, 2016
Commits on Mar 10, 2016
  1. Resolved concurrency push conflict and add unit test, refactor quirky…

    … AdmintClient used in
    
    HdfsFetcher, and add hyperlink in README.md
    
    1. It was very likely to cause content conflict when multiply jobs try to push to
    the same store simultaneously. We add a hashmap in AdminRequestHandler
    to check whether a store is currently fetching or not. And if so, we
    block the rest of fetching request and throw an exception.
    
    2. Added a unit test case under StoreSwapperTest.
    
    3. The HdfsFetcher used to create an admintClient instance and asked
    itself "diskQuotaSizeInKB". This was a quirky used adminClient. We
    removed it and passed quota size directly from AdminRequestHandler.
    
    4. Add a hyperlink in README.md to "A quick git guild for people who
    want to make contributions to Voldemort".
    SidW committed Mar 2, 2016
Commits on Mar 2, 2016
  1. Better error message when pushing to a store with storage quota 0

    Modidied AdminStoreSwapper#nvokeFetch() to throw originial sub-thread
    execption and changed InvalidBostrapURLException to make it more
    clear.
    SidW committed with arunthirupathi Feb 28, 2016
Commits on Mar 1, 2016
  1. Report metrics for Scheduler and Async Service

    Scheduler Service reports tasks currently running
    and number of tasks in the queue
    
    Async Service reports cumulative Wait time and
    number of tasks waiting in the queue.
    arunthirupathi committed Mar 1, 2016
Commits on Feb 9, 2016