Skip to content
This repository has been archived by the owner on Jul 15, 2018. It is now read-only.

STALLED: db: add BadgerDB #58

Closed
wants to merge 5 commits into from
Closed

STALLED: db: add BadgerDB #58

wants to merge 5 commits into from

Conversation

odeke-em
Copy link
Contributor

Updates #30

Adds a backend for BadgerDB as requested at
#30 (comment)
and tests for it along.

BadgerDB has a few problems:
a) It doesn't have a BatchDelete, our fallback is thus Delete
because if do DeleteAsync, by the time batch.Write is invoked,
we can't be sure that the deletions have ran
b) Retrieving values from iterators requires a secondary allocation
equal to the length of the current key.
See https://godoc.org/github.com/dgraph-io/badger#KV.NewIterator
c) It seems to have arbitrary options for the size of the
log file varying between 10MB to 2GB. Currently with our
DB API, we can't pass in the required file size unless we
invoke:

db.NewBadgerDBWithOptions(&db.Options{
  ValueLogFileSize: desiredFileSize,
})

Anyways, I've arbitrarily set the default value log file size to 1GB
given that Tendermint does a whole lot of heavy lifting with
keyvalue stores <-- cite my sources

// db.Set([]byte("foo"), []byte("bar"))
// db.SetSync([]byte("true"), []byte("tendermint"))
// db.Delete([]byte("true"))
// db.Deletesync([]byte("true"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeleteSync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll update that.

func NewBadgerDB(dbName, dir string) (*BadgerDB, error) {
// BadgerDB doesn't expose a way for us to
// create a DB with the user's supplied name.
if err := os.MkdirAll(dir, 0755); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you think MkdirAll belongs to NewBadgerDBWithOptions function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, db.NewDB uses NewBadgerDB and expects the directory making to be handled magically. Moving it out of NewBadgerDB would mean that we'd have to document separately that BadgerDB needs to have its directory made before-hand, and that would make it awkward.

if err := os.MkdirAll(dir, 0755); err != nil {
return nil, err
}
opts := new(badger.Options)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we just do opts := badger.DefaultOptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, I initially thought badger.DefaultOptions was a pointer but somewhere I noticed not and left that in, lol. I'll update it, thank you!

db/badger_db.go Outdated
func (b *BadgerDB) Get(key []byte) []byte {
valueItem := new(badger.KVItem)
if err := b.kv.Get(key, valueItem); err != nil {
// Unfortunate that Get can't return errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

db/badger_db.go Outdated
}
var valueSave []byte
err := valueItem.Value(func(origValue []byte) error {
// TODO: Decide if we should just assign valueSave to origValue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so why can't we just assign?

db/badger_db.go Outdated
// Hesitant to do DeleteAsync because that changes the
// expected ordering
func (bb *badgerDBBatch) Delete(key []byte) {
bb.db.Delete(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no error handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'error handling' present with our structure doesn't return an error yet I am apprehensive about adding panics in Delete.

db/badger_db.go Outdated
func (bb *badgerDBBatch) Write() {
bb.entriesMu.Lock()
entries := bb.entries
bb.entries = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if somebody calls set on this batch after this? code will panic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, after this, it starts a new batch. Take a look at the code in Set https://github.com/tendermint/tmlibs/pull/58/files#diff-9a61ac3e18917559e9062e23d4e9435dR237
or
screen shot 2017-11-05 at 2 01 10 pm

@ebuchman
Copy link
Contributor

ebuchman commented Oct 3, 2017

Cool! Can we base this on Alexis' branch (#57) ? I'd love to see some benchmarks of performance vs our Go LevelDB

@melekes
Copy link
Contributor

melekes commented Oct 11, 2017

also, tests are failing with a missing dependency error.

@melekes melekes changed the base branch from master to develop October 13, 2017 14:31
@melekes
Copy link
Contributor

melekes commented Oct 13, 2017

Cool! Can we base this on Alexis' branch (#57) ? I'd love to see some benchmarks of performance vs our Go LevelDB

alexis' branch was merged into develop. so this branch needs to be rebased against develop

odeke-em and others added 2 commits November 4, 2017 00:13
Updates #30

Adds a backend for BadgerDB as requested at
#30 (comment)
and tests for it along.

BadgerDB has a few problems:
a) It doesn't have a BatchDelete, our fallback is thus Delete
because if do DeleteAsync, by the time batch.Write is invoked,
we can't be sure that the deletions have ran
b) Retrieving values from iterators requires a secondary allocation
equal to the length of the current key.
See https://godoc.org/github.com/dgraph-io/badger#KV.NewIterator
c) It seems to have arbitrary options for the size of the
log file varying between 10MB to 2GB. Currently with our
DB API, we can't pass in the required file size unless we
invoke:
```go
db.NewBadgerDBWithOptions(&db.Options{
  ValueLogFileSize: desiredFileSize,
})
```

Anyways, I've arbitrarily set the default value log file size to 1GB
given that Tendermint does a whole lot of heavy lifting with
keyvalue stores <-- cite my sources
TODO:
- IteratorPrefix does not count in prefix
- badgerDBIterator#Error should return err from Key() or Value()
@melekes
Copy link
Contributor

melekes commented Nov 4, 2017

ok. I've rebased the PR.

@odeke-em could you address some of my comments above and TODO here 2a54ceb.

Also, I would love to see benchmarks too #58 (comment)

@melekes
Copy link
Contributor

melekes commented Nov 4, 2017

I also have a concern that we're adding another dependency (github.com/dgraph-io/badger) to our list of deps, which is already long. Is there a way we could only require it if it's being used?

@melekes
Copy link
Contributor

melekes commented Nov 4, 2017

db/badger_db.go:277:22:warning: should use buf.String() instead of string(buf.Bytes()) (S1030) (megacheck)
db/badger_db.go:282:2:warning: field mu is unused (U1000) (megacheck)
db/badger_db.go:282:2:warning: unused struct field github.com/tendermint/tmlibs/db.badgerDBIterator.mu (structcheck)
db/badger_db_test.go:88::error: wrong number of args for format in Errorf call: 2 needed but 3 args (vet)
db/badger_db_test.go:186::error: wrong number of args for format in Errorf call: 2 needed but 3 args (vet)
db/example_test.go:24::error: Example_BadgerDB has malformed example suffix: BadgerDB (vet)

@cloudhead
Copy link
Contributor

If we're going to support a bunch of backends and don't want to pull in all the deps, we can do something like:

import "tmlibs/db/base"
import "tmlibs/db/drivers/badger"
import "tmlibs/db/drivers/leveldb"

func init() {
  base.Register("badger", badger.NewDriver())
  base.Register("leveldb", leveldb.NewDriver())
}

@odeke-em
Copy link
Contributor Author

odeke-em commented Nov 5, 2017

@melekes I mean do we want the DB though? If we do, I don't see how we can avoid adding the dependency.

@cloudhead what do you mean by "not pull in all the deps"? By including all those imports, the respective object code has to be compiled and linked. Also I believe we already have that style of registering backends

tmlibs/db/db.go

Lines 50 to 64 in d9525c0

func registerDBCreator(backend string, creator dbCreator, force bool) {
_, ok := backends[backend]
if !force && ok {
return
}
backends[backend] = creator
}
func NewDB(name string, backend string, dir string) DB {
db, err := backends[backend](name, dir)
if err != nil {
PanicSanity(Fmt("Error initializing DB: %v", err))
}
return db
}
but I perhaps don't understand what the concern here is, please help me understand it.

@odeke-em
Copy link
Contributor Author

odeke-em commented Nov 5, 2017

@melekes thank you for the vets, I'll update them!

Fixed the code + tests to conform to *vet warnings
and also add a getter and setter for badgerDBIterator's
lastErr.

Thanks to @melekes for the suggestions.
@odeke-em
Copy link
Contributor Author

odeke-em commented Nov 5, 2017

@melekes fist bump for the vets and suggestions to make badgerDBIterator return an error, I've updated those points in commit 534cd7b

For the benchmarks, this cafe is as cold as Siberia, let me walk home, grab dinner and work on them there.

Uncovered dgraph-io/badger#308
which perhaps will require us to use the latest BadgerDB
since they no longer use v0.8(which we started with)
plus until that issue is resolved, our benchmarks are invalid
or spuriously error out.
@cloudhead
Copy link
Contributor

@odeke-em sorry probably wasn't clear: the code snippet I wrote would happen user-side, not in tmlibs. So the user can import additional backends and register them with tmlibs/db. By default, tmlibs/db wouldn't come with any disk-based backends.

@melekes melekes mentioned this pull request Nov 29, 2017
@melekes melekes changed the title db: add BadgerDB STALLED: db: add BadgerDB Feb 2, 2018
@greg-szabo
Copy link

On today's (2018 feb 5th) tendermint dev meeting Bucky mentioned that according to Bucky's and Jae's research, BadgerDB is not mature enough for us to be used. Shall this PR be closed? If not, what's the priority / use case?

@odeke-em
Copy link
Contributor Author

odeke-em commented Feb 6, 2018

Yeah @greg-szabo in deed, I raised some problems here #58 (comment) and also in the filed issue on BadgerDB dgraph-io/badger#308, their API had drastically changed in the span of 2 weeks, but also that the max length of their keys is 1<<16 dgraph-io/badger#308 (comment).

@greg-szabo
Copy link

greg-szabo commented Feb 6, 2018

Closing this PR. Thanks for the work on it @odeke-em , even though we pass on BadgerDB, it was an important point to raise and discuss it.

@greg-szabo greg-szabo closed this Feb 6, 2018
@odeke-em odeke-em deleted the add-badger-db branch March 18, 2018 04:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants