Skip to content
Commits on Jan 8, 2016
  1. @FelixGV

    Bug fixes and improvements for Build and Push.

    FelixGV committed
    - HadoopStoreWriter did not handle multiple chunks correctly which
      caused reduce tasks to step on each others' toes. Fixed.
    - Changed default 'reducer.per.bucket' config to true.
    - AbstractStoreBuilderConfigurable.getPartition() was broken when
      'build.primary.replicas.only' was enabled. Fixed it and added a
      lot more comments.
    - Added defensive code in HadoopStoreWriter to catch future
      regressions in the partitioning (shuffling) code.
    - Improved logging in Props and HadoopStoreBuilder.
  2. @FelixGV

    Fixed a bug where the Read-Only data directory config is not honored …

    FelixGV committed
    …properly.
    
    This fixes a regression introduced in a706147.
Commits on Jan 7, 2016
  1. @arunthirupathi

    AdminClient does not obey the Client Timeout

    arunthirupathi committed
    1) For BuildAndPush If the Voldemort Cluster is far away from the
    Hadoop Cluster, the connection timeout causes the Job to fail.
    Increased the connection timeout in Azkaban.* files for this issue.
    
    2) ClientConfig timeout is ignored by the AdminClient bootstrap
    methods and it constructs an arbitary ClientConfig. Fixed that.
    
    3) AdminClient passes in empty AdminClientConfig and ClientConfig
    at multiple places. Removed that.
Commits on Jan 6, 2016
  1. @FelixGV

    Fix for a regression introduced in a706147.

    FelixGV committed
    Under certain partition assignment configurations, servers would skip the
    initialization of some partitions, which would then result in NPEs getting
    thrown during get requests. This issue is now fixed.
    
    In addition to that, the following changes are also included in this commit:
    - More comprehensive tests in ReadOnlyStorageEngineTest to catch the edge case.
    - Skipped the canGetGoodCompressedKeys() test in ReadOnlyStorageEngineTest
      since key compression is not supported in Read-Only / Build and Push.
    - Better error reporting in ChunkedFileSet.getChunkForKey(byte[] key).
Commits on Jan 5, 2016
  1. @FelixGV
  2. @arunthirupathi

    BnP job corrupts store when compression mismatches

    arunthirupathi committed
    BnP job does not compare compression when making the schema check.
    One store was created with compression enabled on the first run
    and the next run removed it. The store silently corrupted the data.
    
    Now compression is compared and if compression does not match,
    it errors out instead of corrupting the data.
  3. @arunthirupathi

    Disabling Fetch Fails BnP HA

    arunthirupathi committed
    When a node is in offline mode, it responds with fetch disabled
    error. Currently the error is not treated as a soft error and this
    fails the HA BnP.
    
    With this code change, FetchDisabled is considered as a soft error
    and the fetch will continue as normal.
  4. @arunthirupathi

    Add File List and Length based check for RO

    arunthirupathi committed
    1) Currently Stores are validated only for the definition and by
    doing a random get against any of the partition.
    
    Now for RO stores, the files are validated by their replication
    factor.
    
    If RF=1, a line will appear that this store consistency can't be
    validated.
    
    Store xyz has replication factor of 1, skipping
    consistency check across nodes
    
    If RF > 1, a file should have exactly the number of copies as RF.
    If the file lengths does not match, a warning is printed.
    
     Store abc File ReadOnlyFile [name=97_0_0.data,
    size=0] is expected in 2 nodes, but present only in [Node
    localhost:6666 [id 0]]
        Store abc File ReadOnlyFile
    [name=97_1_0.data, size=408351] is expected in 2 nodes, but present only
    in [Node localhost:6669 [id 1]]
    
    If the file is missing the following warning will appear but only once.
    
    The error reporting is very noisy, this is to alert for the presence of
    error. So thrown together quickly without considering the ease of use.
    
    Verified backward and forward compatibility of the change.
  5. @arunthirupathi

    Adding file Length to GetROFileListRepsonse

    arunthirupathi committed
    This will let build validators for the RO files being fetched.
Commits on Dec 23, 2015
  1. @arunthirupathi

    Fix the testNodeDownReplacement intermittent failure

    arunthirupathi committed
    SchedulerService is not waiting for the scheduled jobs to shutdown
    and it proceeds with killing the BDB which causes a cursos
    exception being thrown. This will fix the test.
  2. @arunthirupathi

    Fix RO JMX register/Unregister

    arunthirupathi committed
    1) Stores JMX are registered by JmxService class.
    2) Previously ReadOnlyStorageConfiguration registered the same metrics
    by prepending the NodeId. As Part of the commit
    
    2aec46a#diff-005b79e324515c9e1045a61d0aee6d07
    
    I fixed it and removed the NodeId. But I did not realize that this
    caused the name collission and overwriting.
    
    [18:50:45,445 voldemort.server.jmx.JmxService] WARN Overwriting mbean
    voldemort.store.readonly:type=test2 [main]
    [18:50:45,447 voldemort.server.jmx.JmxService] WARN Overwriting mbean
    voldemort.store.readonly:type=test1 [main]
    
    Now ReadOnlyStorageConfiguration does not register any JMX Metric at
    all. Used the JConsole to verify that it is the same object registered
    under two different names. Now the duplicate one is gone and the
    warnings on the shutdown of Read Only server will be gone as well.
Commits on Dec 16, 2015
  1. @FelixGV

    Read-Only fetches now abort immediately when killed.

    FelixGV committed
    Also improved vadmin.sh's output format and error handling.
Commits on Dec 15, 2015
  1. @arunthirupathi

    Enhanced Admin Meta Check

    arunthirupathi committed
    1) When no parameter is passed in, it silently ignores all the checks.
    Now the default is changed to check all.
    
    2) Random key is generated to probe the store. Previously it always
    passed in byte 0, which failed with InvalidMetadataException.
  2. @arunthirupathi @FelixGV

    BouncyCastle jar is required even when not enabled

    arunthirupathi committed with FelixGV
    BouncyCastle jar is referenced from VoldemortServer and when
    VoldemortServer class is loaded, all its references are resolved
    which causes it to fail with ClassLoader error.
    
    Added one more level of indirection to avoid the direct reference
    and hence BouncyCastle is not required to be available in the class
    path when not enabled.
  3. @FelixGV

    Introduced 'build.primary.replicas.only' mode in BnP.

    FelixGV committed
    Summary: This new mode provides the capability of pushing to
    multiple clusters with different number of nodes and different
    partition assignments.
    
    Compatibility: Although this new mode only works if both the BnP
    job and the Voldemort servers are upgraded, the change can be rolled
    out gradually without breaking anything. There is a negotiation
    phase at the beginning of the BnP job which determines if all
    servers of all clusters are capable and willing (i.e.: configured)
    of using the new mode. If not all servers are upgraded and enabled,
    then the BnP job falls back to its old behavior. Likewise, if a
    server gets a fetch request from a non-upgraded BnP job, it will
    work just like before. By default, servers answer the negotiation
    by saying they support the new mode. The old behavior can be forced
    with the following server-side configuration:
    
    readonly.build.primary.replicas.only=false
    
    Running in this new mode has several implications:
    
    1. When running in the new mode, store files are stored in the
       BnP output directory under nested partition directories, rather
       than in nested node directories.
    2. The MR job uses half as many reducers and half as much shuffle
       bandwidth compared to before.
    3. The meta checksum is now done per partition, rather than per node.
    4. Instead of having one .metadata file per partition, there is now
       only a single full-store.metadata file at the root of the output
       directory.
    5. The server-side HdfsFetcher code inspects the metadata file and
       determines if it should operate in 'build.primary.replicas.only'
       mode or not. If yes, then the server determines which partitions
       it needs to fetch on its own, rather than relying on what the BnP
       job placed in a node-specific directory.
    6. The replica type number contained in Read-Only V2 file names is
       now useless, but we are keeping it in there just to avoid
       unnecessary changes.
    7. When initializing a Read-Only V2 store directory, the server now
       looks for files named with the incorrect replica type, and if it
       finds any, it renames them to the replica type expected by this
       server.
    
    Other changes:
    
    1. Added socket port to Node's toString functions. Also made the
       output of the Node's toString(), briefToString() and getStateString()
       functions more consistent.
    2. Introduced new Protobuf message for the GetConfig admin request.
       This new message is intended to be a generic way to retrieve any
       of server config.
    3. Refactored VoldemortConfig to provide access to any config by its
       string key. Also cleaned up a lot of hard-coded strings, which are
       constants now.
    4. Various minor refactorings in BnP code.
Commits on Dec 12, 2015
  1. @FelixGV

    Better error message for an unknown admin operations.

    FelixGV committed
    Previously, the AdminServiceRequestHandler returned an error message which said "Metadata Key passed '' is not handled yet" whenever attempting to parse VProtoAdmin messages serialized by an AdminClient with more/newer capabilities than the server's.
Commits on Dec 8, 2015
  1. @arunthirupathi

    Swap Only on IOException,UnreachableStoreException

    arunthirupathi committed
    Currently Swap is attempted for any failure.
    Now the swap will be attempted only if the failure is
    an IO or UnreachableStoreException. Other exceptions
    will cause the push to fail.
  2. @arunthirupathi

    Create JMX mbeans for tracking multiple server states

    arunthirupathi committed
    Following states are tracked
        Server 0->normal, 1->offline, 2->rebalancing
        SlopStreaming 0->enabled,1->disabled
        PartitionStreaming 0->enabled,1->disabled
        ReadOnlyFetching 0->enabled,1->disabled
        QuotaEnforcing 0->enabled,1->disabled
  3. @squarY

    Merge pull request #352 from squarY/bouncycastle

    squarY committed
    Let read only server use bouncy castle as JCE provider.
Commits on Dec 5, 2015
  1. Let read only server use bouncy castle as JCE provider.

    Yan Yan(Data Infrastructure) committed
    Initialize BouncyCastleProvider in VoldemortServer constructor if it's enabled.
    
    Fix minor format issue.
    
    Fix unessary format changing.
    
    Fix unessary format issue.
    
    Remove useless parameter.
Commits on Dec 4, 2015
  1. @FelixGV

    BnP logging improvements:

    FelixGV committed
    - BnP job now emits config properties in one entry per line.
    - Bumped up some useful logs to INFO level in the HttpHook.
Commits on Nov 21, 2015
  1. @squarY

    Add 3 properties in Voldemort configuration. Let voldemort node can turn

    squarY committed with Yan Yan(Data Infrastructure)
    on or turn off SSL when fetching file from HDFS.
    
    Fix the issue when url dose not contain protocl(Eg. local file path), parsing url will cause String index out of range exception.
    
    Rollback the format change to original codes.
    
    Make modify URL feature more generic and use java.net.URL instead of parsing URL mannually.
    
    Move modify URL feature to Utils class. And let VoldemortSwapJob to invode this method to replace url.
    
    Let voldemort node modify URL separately before fetching file.
Commits on Nov 18, 2015
  1. @arunthirupathi

    Update metadata version cluster, stores

    arunthirupathi committed
    At times metadata version on a cluster.xml and stores.xml drifts
    The next metadata update instead of consolidating these versions
    updates at few places and ignores at few other places.
    
    This causes the client to not re-bootstrap correctly when the
    cluster.xml or stores.xml is changed.
    
    Now when a cluster.xml is changed, the version is synchronized
    across the cluster to let the clients auto rebootstrap.
    
    This fix merges the VectorClock on all the nodes to be updated
    so that the Stores version will be updated correctly.
    
    The old methods which does not take nodes as parameters are removed
    and the public method exposes the nodes as parameters.
Commits on Nov 17, 2015
  1. @gnb

    Merge pull request #345 from gnb/VOLDENG-2171bis

    gnb committed
    Extend shell "preflist" command to show partitions, v2
  2. @gnb

    Extend shell "preflist" command to show partitions, v2

    gnb committed
    After valuable feedback from athirupthi
Commits on Nov 16, 2015
  1. @gnb

    Merge pull request #343 from gnb/VOLDENG-2170

    gnb committed
    Fix shell "preflist" command key parsing
  2. @gnb
Commits on Nov 13, 2015
  1. @gnb

    Merge pull request #342 from gnb/rorep2

    gnb committed
    ReadOnlyReplicationHelperCLI: do not rely on down node
  2. @gnb

    ReadOnlyReplicationHelperCLI: do not rely on down node

    gnb committed
    When building a list of partitions to be copied between nodes
    to restore a down node, don't expect to be able to fetch any
    useful metadata from the down node.
    
    v2: better method naming per feedback from athirupathi
Commits on Nov 11, 2015
  1. @FelixGV

    Bug fix for BnP HA:

    FelixGV committed
    - StoreVersionManager.getDisabledMarkerFile() did not look for the right file name.
    - Also added unit tests and better log messages for this.
    
    Debuggability improvements in AdminServiceRequestHandler:
    - More exceptions now logged with their full stacktrace on the server-side.
    - The request info is now printed in a more readable format.
    
    BnP job will now fail properly when clusters are inconsistent, rather than just exiting.
Commits on Nov 6, 2015
  1. @stotch

    adminClient now uses the same ClientConfig as the SocketStoreClient s…

    stotch committed
    …o that it too can get the overrides passed in by a user defined config file
Commits on Nov 4, 2015
  1. @FelixGV

    Implement the truncate function in the RocksDB storage engine. This f…

    James Lent committed with FelixGV
    …unctionality is required by
    
    the delete store command.  It is implemented using a non default Column Family named after the store.
    This approach could be extended to allow multiple stores to be supported by a single RocksDB
    database. It is, however, not backwards compatible with the existing code. A data migration
    would be required to support existing stores.
Commits on Nov 3, 2015
  1. @FelixGV

    Moved BnP's verifyOrAddStore() to the AdminClient.

    FelixGV committed
    This makes it easier to leverage this more resilient/idempotent Add Store operation from other processes.
    
    Also included minor refactorings.
  2. @singhsiddharth

    Merge pull request #333 from stotch/voldemort-shell-properties-squashed

    singhsiddharth committed
    voldemort-shell support for properties file overrides
  3. @stotch

    Removing the redundant argument check so that the VoldemortClientShel…

    stotch committed
    …l.java option parser can be used to query help
    
    Removing the redundant argument check so that the VoldemortClientShell.java option parser can be used to query help
    
    Augmented help and added an option for passing in a properties file
    
    The shell now passes in an option that tells VoldemortClientShell that it is being called from voldemort-shell.sh so that it can properly format the help output
    
    The shell now passes in an option that tells VoldemortClientShell that it is being called from voldemort-shell.sh so that it can properly format the help output
    
    Made help for --voldemort-shell option more 'helpful'
    
    Mistakenly bumped up a greaterthan check on the positional arguments ... Rolling that back
    
    Rewording --help helper text to be more helpful
Something went wrong with that request. Please try again.