Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Native backup capabilities for BDB #51

Closed
wants to merge 18 commits into from

10 participants

@jroper

Ok, so I'm not sure what people will think of this. Reading google, the recommended method for backing up a BDB voldemort store is to just copy the files while voldemort is running. Because BDB is an append only database, this should work, right? Wrong, BDB's clean up thread can get in the way and corrupt the backups. The Java BDB port provides a tool for assisting with backups, basically it finishes the current BDB file, and then pauses the clean up thread while the backup is running so that a safe backup can be done. This tool must be run within the same Java process for the correct locking to occur. For detailed documentation you can read here:

http://download.oracle.com/docs/cd/E17277_02/html/java/com/sleepycat/je/util/DbBackup.html

So, I've implemented an option into the voldemort admin client to do this, called native backup. Storage engines can declare themselves to support this by implementing NativeBackupable. I've implemented support for this in the BdbStorageEngine, which does fast NIO copies, and also supports incremental backups. Backups are guaranteed to be consistent.

Unfortunately because I modified the protobuff files, there's a lot of garbage in my commit. Also, my IDE did some minor whitespace changes to every javadoc in AdminClient, I didn't notice that until I initiated this pull request. I hope that's ok, if you'd like me to fix let me know.

I've written unit tests for the backup itself, there aren't any existing tests for the admin client/protocol, so I didn't write tests for the changes I made to the client.

@otisg

+1 for getting this into V

@chaner

+1 for getting this in as well. Looking for a good back up solution.

@eoconnor

We totally need this. WE CAN DO THIS PEOPLE!

@gweinger

+1 We need this badly. Bring it on!

@JeanPerrot

Yup. That's what I want.

@timchristensen

Bring on the backup system. +1

@pcorkett

+1 as well, much needed.

@jroper

I'm not sure what so many people commenting on this in such a short time span is about, some accounts look dubious, but I assure you this is not an attempt by me to try and do anything. I would appreciate an answer to my pull request though, if you don't like it, let me know why, so at least I can try and do something so that if my company starts using this in production, we can ensure that we can still upgrade to the latest version of Voldemort at any time.

@chaner

We just really want a good back up solution :)

(Not affiliated with jroper, but we appreciate your work!)

@chaner

@jroper: I just created a build off of your fork, populated my store with some data, but I'm getting a null pointer error when I try to run a backup.

The command I'm running is:
./bin/voldemort-admin-tool.sh --url tcp://localhost:6666 --node 0 --native-backup EventsStore --backup-dir ./backup

The stack trace:
voldemort.VoldemortException: Failed while waiting for async task (null) at node 0 to finish
at voldemort.client.protocol.admin.AdminClient.waitForCompletion(AdminClient.java:1224)
at voldemort.client.protocol.admin.AdminClient.waitForCompletion(AdminClient.java:1249)
at voldemort.client.protocol.admin.AdminClient.nativeBackup(AdminClient.java:2266)
at voldemort.VoldemortAdminTool.main(VoldemortAdminTool.java:443)
Caused by: voldemort.VoldemortException: java.lang.NullPointerException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at voldemort.utils.ReflectUtils.callConstructor(ReflectUtils.java:116)
at voldemort.utils.ReflectUtils.callConstructor(ReflectUtils.java:103)
at voldemort.store.ErrorCodeMapper.getError(ErrorCodeMapper.java:70)
at voldemort.client.protocol.admin.AdminClient.throwException(AdminClient.java:1153)
at voldemort.client.protocol.admin.AdminClient.getAsyncRequestStatus(AdminClient.java:1007)
at voldemort.client.protocol.admin.AdminClient.waitForCompletion(AdminClient.java:1201)
... 3 more
Failed while waiting for async task (null) at node 0 to finish

Am I missing something?

Thanks,
Eric

@jroper

Hi Eric,

There may be an error message logged on the server side, this will show exactly where the NPE came from, could you check the server logs to see if there are any stack traces there?

Cheers,

James

@eoconnor

James,

I'm a different Eric (O'Connor), but I work with the first Eric (Chan), and we're working on the same project, so apologies for any confusion that ensues. :)

Anyway, I tried this as well, and got a very similar but slightly different exception from the one Eric Chan got:

$ ./bin/voldemort-admin-tool.sh --url tcp://localhost:7667 --node 0 --native-backup EventsStore --backup-dir ./backup
[2011-10-12 15:47:50,939 voldemort.store.socket.clientrequest.ClientRequestExecutorFactory$ClientRequestSelectorManager] INFO Closed, exiting
[2011-10-12 15:47:50,939 voldemort.store.socket.clientrequest.ClientRequestExecutorFactory$ClientRequestSelectorManager] INFO Closed, exiting
[2011-10-12 15:47:50,939 voldemort.store.socket.clientrequest.ClientRequestExecutorFactory$ClientRequestSelectorManager] INFO Closed, exiting
[2011-10-12 15:47:50,940 voldemort.store.socket.clientrequest.ClientRequestExecutorFactory$ClientRequestSelectorManager] INFO Closed, exiting
[2011-10-12 15:47:50,940 voldemort.store.socket.clientrequest.ClientRequestExecutorFactory$ClientRequestSelectorManager] INFO Closed, exiting
[2011-10-12 15:47:50,940 voldemort.store.socket.clientrequest.ClientRequestExecutorFactory$ClientRequestSelectorManager] INFO Closed, exiting
[2011-10-12 15:47:50,941 voldemort.store.socket.clientrequest.ClientRequestExecutorFactory$ClientRequestSelectorManager] INFO Closed, exiting
[2011-10-12 15:47:50,941 voldemort.store.socket.clientrequest.ClientRequestExecutorFactory$ClientRequestSelectorManager] INFO Closed, exiting
[2011-10-12 15:47:50,942 voldemort.store.socket.clientrequest.ClientRequestExecutor] WARN No client associated with Socket[unconnected]
[2011-10-12 15:47:50,942 voldemort.store.socket.clientrequest.ClientRequestExecutor] INFO Closing remote connection from Socket[unconnected]
voldemort.VoldemortException: Failed while waiting for async task (null) at node 0 to finish
at voldemort.client.protocol.admin.AdminClient.waitForCompletion(AdminClient.java:1224)
at voldemort.client.protocol.admin.AdminClient.waitForCompletion(AdminClient.java:1249)
at voldemort.client.protocol.admin.AdminClient.nativeBackup(AdminClient.java:2266)
at voldemort.VoldemortAdminTool.main(VoldemortAdminTool.java:443)
Caused by: voldemort.VoldemortException: No operation with id 0 found
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at voldemort.utils.ReflectUtils.callConstructor(ReflectUtils.java:116)
at voldemort.utils.ReflectUtils.callConstructor(ReflectUtils.java:103)
at voldemort.store.ErrorCodeMapper.getError(ErrorCodeMapper.java:70)
at voldemort.client.protocol.admin.AdminClient.throwException(AdminClient.java:1153)
at voldemort.client.protocol.admin.AdminClient.getAsyncRequestStatus(AdminClient.java:1007)
at voldemort.client.protocol.admin.AdminClient.waitForCompletion(AdminClient.java:1201)
... 3 more
Failed while waiting for async task (null) at node 0 to finish

The only difference I can see between the two exceptions is that I got a "No operation with id 0 found" instead of a NullPointerException.

When I checked the Voldemort server log, this is what I saw:

600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Started streaming slop pusher job at Wed Oct 12 15:49:55 PDT 2011
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Started streaming slop pusher job at Wed Oct 12 15:49:55 PDT 2011
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Acquiring lock to perform streaming slop pusher job
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Acquiring lock to perform streaming slop pusher job
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Acquired lock to perform streaming slop pusher job
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Acquired lock to perform streaming slop pusher job
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Slops to node 0 - Succeeded - 0 - Attempted - 0
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Slops to node 0 - Succeeded - 0 - Attempted - 0
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Completed streaming slop pusher job which started at Wed Oct 12 15:49:55 PDT 2011
600746 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Completed streaming slop pusher job which started at Wed Oct 12 15:49:55 PDT 2011
601605 [voldemort-niosocket-server3] INFO voldemort.server.niosocket.AsyncRequestHandler - Protocol negotiated for Socket[addr=/127.0.0.1,port=64314,localport=7667]: voldemort-native-v1
601605 [voldemort-niosocket-server3] INFO voldemort.server.niosocket.AsyncRequestHandler - Protocol negotiated for Socket[addr=/127.0.0.1,port=64314,localport=7667]: voldemort-native-v1
601609 [voldemort-niosocket-server3] INFO voldemort.server.niosocket.AsyncRequestHandler - Closing remote connection from Socket[addr=/127.0.0.1,port=64314,localport=7667]
601609 [voldemort-niosocket-server3] INFO voldemort.server.niosocket.AsyncRequestHandler - Closing remote connection from Socket[addr=/127.0.0.1,port=64314,localport=7667]
[GC 41673K->34615K(83008K), 0.0068875 secs]
[GC 44381K->44135K(83008K), 0.0403037 secs]
[GC 54044K(83008K), 0.0026255 secs]
602538 [voldemort-niosocket-server2] INFO voldemort.server.niosocket.AsyncRequestHandler - Protocol negotiated for Socket[addr=/127.0.0.1,port=64315,localport=7668]: admin-v1
602538 [voldemort-niosocket-server2] INFO voldemort.server.niosocket.AsyncRequestHandler - Protocol negotiated for Socket[addr=/127.0.0.1,port=64315,localport=7668]: admin-v1
602546 [voldemort-niosocket-server2] ERROR voldemort.server.protocol.admin.AdminServiceRequestHandler - handleGetMetadata failed for request()
voldemort.VoldemortException: Metadata Key passed '' is not handled yet
at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleGetMetadata(AdminServiceRequestHandler.java:1033)
at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleRequest(AdminServiceRequestHandler.java:165)
at voldemort.server.niosocket.AsyncRequestHandler.read(AsyncRequestHandler.java:120)
at voldemort.utils.SelectorManagerWorker.run(SelectorManagerWorker.java:98)
at voldemort.utils.SelectorManager.run(SelectorManager.java:193)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
602546 [voldemort-niosocket-server2] ERROR voldemort.server.protocol.admin.AdminServiceRequestHandler - handleGetMetadata failed for request()
voldemort.VoldemortException: Metadata Key passed '' is not handled yet
at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleGetMetadata(AdminServiceRequestHandler.java:1033)
at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleRequest(AdminServiceRequestHandler.java:165)
at voldemort.server.niosocket.AsyncRequestHandler.read(AsyncRequestHandler.java:120)
at voldemort.utils.SelectorManagerWorker.run(SelectorManagerWorker.java:98)
at voldemort.utils.SelectorManager.run(SelectorManager.java:193)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
602548 [voldemort-niosocket-server2] ERROR voldemort.server.protocol.admin.AdminServiceRequestHandler - handleAsyncStatus failed for request(request_id: 0)
voldemort.VoldemortException: No operation with id 0 found
at voldemort.server.protocol.admin.AsyncOperationService.getOperationStatus(AsyncOperationService.java:135)
at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleAsyncStatus(AdminServiceRequestHandler.java:911)
at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleRequest(AdminServiceRequestHandler.java:186)
at voldemort.server.niosocket.AsyncRequestHandler.read(AsyncRequestHandler.java:120)
at voldemort.utils.SelectorManagerWorker.run(SelectorManagerWorker.java:98)
at voldemort.utils.SelectorManager.run(SelectorManager.java:193)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
602548 [voldemort-niosocket-server2] ERROR voldemort.server.protocol.admin.AdminServiceRequestHandler - handleAsyncStatus failed for request(request_id: 0)
voldemort.VoldemortException: No operation with id 0 found
at voldemort.server.protocol.admin.AsyncOperationService.getOperationStatus(AsyncOperationService.java:135)
at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleAsyncStatus(AdminServiceRequestHandler.java:911)
at voldemort.server.protocol.admin.AdminServiceRequestHandler.handleRequest(AdminServiceRequestHandler.java:186)
at voldemort.server.niosocket.AsyncRequestHandler.read(AsyncRequestHandler.java:120)
at voldemort.utils.SelectorManagerWorker.run(SelectorManagerWorker.java:98)
at voldemort.utils.SelectorManager.run(SelectorManager.java:193)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)
602672 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Started streaming slop pusher job at Wed Oct 12 15:49:57 PDT 2011
602672 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Started streaming slop pusher job at Wed Oct 12 15:49:57 PDT 2011
602672 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Acquiring lock to perform streaming slop pusher job
602672 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Acquiring lock to perform streaming slop pusher job
602672 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Acquired lock to perform streaming slop pusher job
602672 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Acquired lock to perform streaming slop pusher job
602673 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Slops to node 0 - Succeeded - 0 - Attempted - 0
602673 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Slops to node 0 - Succeeded - 0 - Attempted - 0
602673 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Completed streaming slop pusher job which started at Wed Oct 12 15:49:57 PDT 2011
602673 [java.util.concurrent.ThreadPoolExecutor$Worker] INFO voldemort.server.scheduler.slop.StreamingSlopPusherJob - Completed streaming slop pusher job which started at Wed Oct 12 15:49:57 PDT 2011

Any ideas?

Thanks,
Eric

@afeinberg
Collaborator

Hi,

We're still in the process of reviewing this. Could you rebase this against the latest master for easier review?

Thanks,

  • Alex
@jroper

I think I stuffed something up in this pull request so I'll close this one and issue another.

@jroper jroper closed this
@jroper

New pull request with the IDE autoformatting issues fixed (much cleaner patch, apart from the generated protobuf stuff): #54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Aug 26, 2011
Commits on Sep 3, 2011
  1. @afeinberg

    Fix a typo

    afeinberg authored
Commits on Sep 15, 2011
  1. @jroper
  2. @jroper
  3. @jroper

    Added unit tests

    jroper authored
  4. @jroper
  5. @jroper
  6. @jroper

    Adding myself to contributors

    jroper authored
Commits on Oct 3, 2011
  1. add a new feature to the performance tool that allows for sampling an…

    Lei Gao authored
    …d playing back values in the store
Commits on Oct 4, 2011
  1. @afeinberg
Commits on Oct 7, 2011
  1. allow StreamingSlopPusherJob to pick up cluster.xml change dynamicall…

    Lei Gao authored
    …y and a unit test
Commits on Oct 14, 2011
  1. @jroper
  2. @jroper
  3. @jroper

    Added unit tests

    jroper authored
  4. @jroper
  5. @jroper
  6. @jroper

    Adding myself to contributors

    jroper authored
  7. @jroper

    Merge branch 'master' of github.com:jroper/voldemort

    jroper authored
    Conflicts:
    	src/java/voldemort/store/bdb/BdbStorageEngine.java
Something went wrong with that request. Please try again.