Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: remove mutex in prefixdb #239

Merged
merged 41 commits into from
Jul 29, 2022
Merged

Conversation

faddat
Copy link
Contributor

@faddat faddat commented Apr 25, 2022

closes:

#156

  • Adds additional testing to all databases
  • Adds .gitpod.yml so that devs can click a single button and ensure that changes work, in the exact docker environment that we use in ci

@codecov
Copy link

codecov bot commented Apr 25, 2022

Codecov Report

Merging #239 (f39f7bb) into master (d1b9b74) will decrease coverage by 0.50%.
The diff coverage is 92.30%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #239      +/-   ##
==========================================
- Coverage   68.54%   68.03%   -0.51%     
==========================================
  Files          27       27              
  Lines        2130     2093      -37     
==========================================
- Hits         1460     1424      -36     
+ Misses        595      594       -1     
  Partials       75       75              
Impacted Files Coverage Δ
badger_db.go 87.77% <0.00%> (ø)
boltdb_batch.go 82.69% <ø> (ø)
boltdb_iterator.go 92.10% <ø> (ø)
cleveldb.go 70.99% <ø> (ø)
cleveldb_batch.go 82.35% <ø> (ø)
prefixdb.go 63.55% <ø> (-6.79%) ⬇️
prefixdb_iterator.go 81.01% <ø> (-0.24%) ⬇️
rocksdb.go 72.26% <ø> (ø)
rocksdb_batch.go 84.31% <ø> (ø)
boltdb.go 56.55% <100.00%> (-0.71%) ⬇️
... and 4 more
Impacted Files Coverage Δ
badger_db.go 87.77% <0.00%> (ø)
boltdb_batch.go 82.69% <ø> (ø)
boltdb_iterator.go 92.10% <ø> (ø)
cleveldb.go 70.99% <ø> (ø)
cleveldb_batch.go 82.35% <ø> (ø)
prefixdb.go 63.55% <ø> (-6.79%) ⬇️
prefixdb_iterator.go 81.01% <ø> (-0.24%) ⬇️
rocksdb.go 72.26% <ø> (ø)
rocksdb_batch.go 84.31% <ø> (ø)
boltdb.go 56.55% <100.00%> (-0.71%) ⬇️
... and 4 more

Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In light of the argument in #156, this seems like it should be safe.

I am a bit worried that we don't seem to have any tests that exercise the documented concurrency-safety requirement, though -- and there is at least one case where the implementations differ about what operations are allowed to overlap (cf. https://github.com/tendermint/tm-db/blob/master/util_test.go#L29).

Running the race detector can't really help, when there is no concurrency for it to check. This PR doesn't create that problem, but it does make it more relevant.

prefixdb.go Outdated Show resolved Hide resolved
@faddat
Copy link
Contributor Author

faddat commented Apr 25, 2022

@creachadair

I may need to nuke-reinstall my test machine, but I did a bit of a survey on databases recently.

That led me to the terra-money/tmdb:performance branch.

The speed is super impressive but I've had an explosion (concurrent map read and write error).

Can you review my method for testing this stuff?

I am thinking like:

  • make a gitlab instance
  • automate a replace of tm-db
  • run syncs of gaia and osmosis in parallel, check the impact on sync time
  • when we've got synced nodes, do an automated test of queries with vegeta

My chief concern is sync times. They are too long and this slows down all other work in and on cosmos.

But if we don't pay careful attention to this stuff, we can miss things-- for example there's a state breaking change somewhere in the 42 series. (you'll note I didn't mention where-- as I don't know :P )

But how do I know it's there?

I'm sure it's there because I applied tendermint 34.19 and sdk v42.11 to gaia and could not do a genesis sync.

@faddat
Copy link
Contributor Author

faddat commented Apr 25, 2022

cosmos/gaia#1415

@faddat
Copy link
Contributor Author

faddat commented Apr 25, 2022

@creachadair
Copy link
Contributor

creachadair commented Apr 26, 2022

Can you review my method for testing this stuff?

I am thinking like:

  • make a gitlab instance
  • automate a replace of tm-db
  • run syncs of gaia and osmosis in parallel, check the impact on sync time
  • when we've got synced nodes, do an automated test of queries with vegeta

I'm not convinced we need anything that elaborate. Even just a regular Go test that creates a database instance and does some amount of parallel insertions, lookups, and deletions would give the race detector something to push against. I don't think we need an elaborate multi-process test harness to verify the basic concurrency properties the library is intended to guarantee.

Of course, integration tests are nice too, but as far as tm-db itself, integration is a concern for the clients of the library.

Very roughly, I was thinking along these lines:

// Run generates concurrent reads and writes to db so the race detector can
// verify concurrent operations are properly synchronized.
// The contents of db are garbage after Run returns.
func Run(t *testing.T, db tmdb.DB) {
	t.Helper()

	const numWorkers = 10
	const numKeys = 32

	var wg sync.WaitGroup
	for i := 0; i < numWorkers; i++ {
		wg.Add(1)
		i := i
		go func() {
			defer wg.Done()

			// Insert a bunch of keys with random data.
			for k := 1; k <= numKeys; k++ {
				key := taskKey(i, k) // say, "task-<i>-key-<k>"
				value := someRandomValue()
				if err := db.Set(key, value); err != nil {
					t.Errorf("Task %d: db.Set(%q=%q) failed: %v", 
						i, string(key), string(value), err)
				}
			}

			// Iterate over the database to make sure our keys are there.
			it, err := db.Iterator(nil, nil)
			if err != nil {
				t.Errorf("Iterator[%d]: %v", i, err)
				return
			}
			found := make(map[string][]byte)
			mine := []byte(fmt.Sprintf("task-%d-", i))
			for it.Valid() {
				it.Next()
				if key := it.Key(); bytes.HasPrefix(key, mine) {
					found[string(key)] = it.Value()
				}
			}
			if err := it.Error(); err != nil {
				t.Errorf("Iterator[%d] reported error: %v" i, err)
			}
			if err := it.Close(); err != nil {
				t.Errorf("Close iterator[%d]: %v", i, err)
			}
			if len(mine) != numKeys {
				t.Errorf("Task %d: found %d keys, wanted %d", i, len(mine), numKeys)
			}

			// Delete all the keys we inserted.
			for key := range mine {
				if err := db.Delete([]byte(key)); err != nil {
					t.Errorf("Delete %q: %v", key, err)
				}
			}
		}()
	}
	wg.Wait()
}

N.B. this is not tested, but should be the right general flavour.

@faddat
Copy link
Contributor Author

faddat commented Apr 27, 2022

@creachadair thank you! it might even help me with the PebbleDB implementation

@faddat
Copy link
Contributor Author

faddat commented May 19, 2022

I've added .gitpod.yml to this

@creachadair if you wouldn't mind trying the gitpod browser extension, I think you'd instantly understand why it is a nice-to-have.

https://www.gitpod.io/docs/browser-extension

Just saved me a whole bunch of time. I was able to work on the linter issues in the same environment as CI, and didn't need to think about my development environment.

…tr.isInvalid were considered to be ineffective assignments by the linter.
@faddat
Copy link
Contributor Author

faddat commented May 19, 2022

That was dramatically easier than making a dev env, and now it's all set :D!

@faddat faddat requested a review from tac0turtle May 20, 2022 08:17
@faddat
Copy link
Contributor Author

faddat commented May 20, 2022

I think this one is set now.

Copy link
Contributor

@creachadair creachadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty close, there are just a few details to sort out.

cleveldb_test.go Outdated Show resolved Hide resolved
prefixdb_test.go Outdated Show resolved Hide resolved
prefixdb_test.go Outdated Show resolved Hide resolved
.gitpod.yml Show resolved Hide resolved
prefixdb_test.go Show resolved Hide resolved
faddat and others added 4 commits May 23, 2022 12:56
Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
Co-authored-by: M. J. Fromberger <michael.j.fromberger@gmail.com>
@faddat
Copy link
Contributor Author

faddat commented Jun 4, 2022

@creachadair I hope to have the tests moved today, wish me luck...

@faddat
Copy link
Contributor Author

faddat commented Jul 15, 2022

is this wanted?

@tac0turtle
Copy link
Contributor

id vote for yes

This was referenced Jul 27, 2022
@tac0turtle
Copy link
Contributor

generally the scope should be fixed on a single item in prs. Expanding scope makes it take longer to get things merged.

@tac0turtle tac0turtle merged commit 00fb04a into tendermint:master Jul 29, 2022
@ghost ghost mentioned this pull request Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants