Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] issue828 state syncing #3243

Closed
wants to merge 6 commits into from

Conversation

ackratos
Copy link
Contributor

@ackratos ackratos commented Feb 2, 2019

  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Updated CHANGELOG_PENDING.md

For #828

This PR is still working in progress, main functionality can "work" on our app but needs testing (multiple peer edge cases and performance) and complete thread safety.

Just raise PR for community review/comments

@ackratos
Copy link
Contributor Author

ackratos commented Feb 2, 2019

#828 (comment)

@zmanian
Copy link
Contributor

zmanian commented Feb 10, 2019

I just wanted to give you some feedback that a number of us have looked at your implementation and hopefully we can provide in depth technical feedback soon.

At a high level, one of the issues that's been important to us is that API for state synching must provide a mechanism of verifying chunks of data incrementally against the state root Bittorrent style.

This is one of our highest priority features in Cosmos over the next few months.

@ackratos
Copy link
Contributor Author

At a high level, one of the issues that's been important to us is that API for state synching must provide a mechanism of verifying chunks of data incrementally against the state root Bittorrent style.

Thanks for looking into this!
Yes, that's major missing part of this PR. The specification doesn't describe verify mechanism in detail, what I thought is it should be like an additional calculated check sum? i.e. concatenate chunks then check sum it?

@zmanian
Copy link
Contributor

zmanian commented Feb 12, 2019

The basic approach we are thinking of is requiring the underlying data store to support RangeProofs like IAVL tree does and then provide a range proof for all the leaves in the chunk.

@ackratos
Copy link
Contributor Author

The basic approach we are thinking of is requiring the underlying data store to support RangeProofs like IAVL tree does and then provide a range proof for all the leaves in the chunk.

then we need a new abci api like VerifyRecoveryChunk(chunk [][]byte) error? Or write chunks as soon as we receive? (Currently this implementation write state chunks on when we received all chunks)

@zmanian
Copy link
Contributor

zmanian commented Feb 13, 2019

Yeah I think VerifyRecoveryChunk makes sense.

If you only write chunks when all have been received, a malicious peer can trivially send you unlimited number of invalid chunks that you can't detect.

@ackratos
Copy link
Contributor Author

ackratos commented Mar 5, 2019

Sorry for long time no update, but I was keep testing this against our application. It works (in preliminary stage, not tested in large scale) now on our environment, I have updated this PR to latest status so you can take another round of review.

And what's more, relying on iterate db on state sync related abci (ReadChunks / LatestSnapshot) will must be a heavy work. We want borrow the way ethereum parity warp-sync (https://wiki.parity.io/Warp-Sync) or BitTorrent organize snapshot (application db version) that load version - split it into chunk files with hash - have a manifest know all parts' hash - peers download manifest + some files via p2p (at the same time, they can serve file parts to other nodes)

I haven't taken deep look into detail design/implementation, but want hear from your opinions on this proposal, feel free to review this PR and reply to our new idea:)

@ackratos
Copy link
Contributor Author

Sorry for long time no update, but I was keep testing this against our application. It works (in preliminary stage, not tested in large scale) now on our environment, I have updated this PR to latest status so you can take another round of review.

And what's more, relying on iterate db on state sync related abci (ReadChunks / LatestSnapshot) will must be a heavy work. We want borrow the way ethereum parity warp-sync (https://wiki.parity.io/Warp-Sync) or BitTorrent organize snapshot (application db version) that load version - split it into chunk files with hash - have a manifest know all parts' hash - peers download manifest + some files via p2p (at the same time, they can serve file parts to other nodes)

I haven't taken deep look into detail design/implementation, but want hear from your opinions on this proposal, feel free to review this PR and reply to our new idea:)

@melekes @ebuchman any idea on this?

Copy link
Contributor

@ebuchman ebuchman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this and sorry for delayed response. Looks like some really great work, and I'm excited to hear you seem to have it working!!

That said, we haven't really put in the effort since the original spec was written up in #828, but we probably want to make some changes to that. The main thing that comes to mind on the design front is that maybe we can/should reduce the number of changes to ABCI right now by:

  • making LatestSnapshot and ReadSnapshotChunk methods just be particular uses of Query. Query is already supposed to be used for reading from the current and historical state, so I think we should be able to reuse it for that here too. Do you think we really need the new methods, or can we just use Query?
  • removing StartRecovery. Do we really need it ?
  • change WriteRecoveryChunk to take a struct that also includes a done bool to indicate if this is the last chunk. then we don't need EndRecovery either!

The net result would be that we only need to add one new ABCI method, which would be kind of amazing if possible.

Additionally, we have just undertaken a re-write of the blockchain reactor (##2897). @ancazamfir and @milosevic are leading this, so they should probably weigh in here too. The current blockchain reactor design is very difficult to test and reason about, and the new design is intended to be reusable for state sync as well, so I think we want to avoid doubling down on the existing design. That said, I haven't reviewed this thoroughly yet, so I don't know how much it borrows from the existing blockchain reactor (but seems like a bunch, and I don't see any tests).

I would also think the state sync reactor should come in its own package. We can just have blockchain/state for now but we probably want to rename blockchain to sync and then have sync/blockchain and sync/state and move the existing blockchain/store* elsewhere. Just thinking aloud here and this shouldn't all happen in the same PR.

Finally, it would be incredible if we could do #3176 - then this implementation of state sync reactor could easily be used without a fork even if its not merged in, and other implementations could be worked on in parallel.

// TODO: reduce this by segmentation
MaxStateSizeBytes = 1048576000 // 1000MB

MonitorWindowInSeconds = 40
Copy link
Contributor

@ebuchman ebuchman Mar 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem to belong here. These are hard consensus params, but this is more like a reactor setting.

@ackratos ackratos requested a review from xla as a code owner March 21, 2019 12:21
@ackratos
Copy link
Contributor Author

  • making LatestSnapshot and ReadSnapshotChunk methods just be particular uses of Query. Query is already supposed to be used for reading from the current and historical state, so I think we should be able to reuse it for that here too. Do you think we really need the new methods, or can we just use Query?

abci query shares rw lock with other block execution apis (begin, end, commit, deliver, check tx) and this query is very heavy according to our experience. In case of chain have 500k user, there would be 1 million keys for acc substore of app state iterate all to get num of keys would cost several seconds...

we implemented cache for state sync in app level and only update the cache on syncable waypoint. The updating and query of cache shares a lock different with abci lock.

@ackratos
Copy link
Contributor Author

  • removing StartRecovery. Do we really need it ?
  • change WriteRecoveryChunk to take a struct that also includes a done bool to indicate if this is the last chunk. then we don't need EndRecovery either!

Good idea, I can do this.

@ackratos
Copy link
Contributor Author

I would also think the state sync reactor should come in its own package. We can just have blockchain/state for now but we probably want to rename blockchain to sync and then have sync/blockchain and sync/state and move the existing blockchain/store* elsewhere. Just thinking aloud here and this shouldn't all happen in the same PR.

good idea. will do this.

@ackratos ackratos mentioned this pull request Apr 25, 2019
4 tasks
@ackratos
Copy link
Contributor Author

closed in favor of new implementation #3594 for new proposal https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants