-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] issue828 state syncing #3243
Conversation
I just wanted to give you some feedback that a number of us have looked at your implementation and hopefully we can provide in depth technical feedback soon. At a high level, one of the issues that's been important to us is that API for state synching must provide a mechanism of verifying chunks of data incrementally against the state root Bittorrent style. This is one of our highest priority features in Cosmos over the next few months. |
Thanks for looking into this! |
The basic approach we are thinking of is requiring the underlying data store to support RangeProofs like IAVL tree does and then provide a range proof for all the leaves in the chunk. |
then we need a new abci api like |
Yeah I think VerifyRecoveryChunk makes sense. If you only write chunks when all have been received, a malicious peer can trivially send you unlimited number of invalid chunks that you can't detect. |
Sorry for long time no update, but I was keep testing this against our application. It works (in preliminary stage, not tested in large scale) now on our environment, I have updated this PR to latest status so you can take another round of review. And what's more, relying on iterate db on state sync related abci (ReadChunks / LatestSnapshot) will must be a heavy work. We want borrow the way ethereum parity warp-sync (https://wiki.parity.io/Warp-Sync) or BitTorrent organize snapshot (application db version) that load version - split it into chunk files with hash - have a manifest know all parts' hash - peers download manifest + some files via p2p (at the same time, they can serve file parts to other nodes) I haven't taken deep look into detail design/implementation, but want hear from your opinions on this proposal, feel free to review this PR and reply to our new idea:) |
3fa4ec3
to
cfe8783
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this and sorry for delayed response. Looks like some really great work, and I'm excited to hear you seem to have it working!!
That said, we haven't really put in the effort since the original spec was written up in #828, but we probably want to make some changes to that. The main thing that comes to mind on the design front is that maybe we can/should reduce the number of changes to ABCI right now by:
- making
LatestSnapshot
andReadSnapshotChunk
methods just be particular uses ofQuery
. Query is already supposed to be used for reading from the current and historical state, so I think we should be able to reuse it for that here too. Do you think we really need the new methods, or can we just use Query? - removing
StartRecovery
. Do we really need it ? - change
WriteRecoveryChunk
to take a struct that also includes adone bool
to indicate if this is the last chunk. then we don't need EndRecovery either!
The net result would be that we only need to add one new ABCI method, which would be kind of amazing if possible.
Additionally, we have just undertaken a re-write of the blockchain reactor (##2897). @ancazamfir and @milosevic are leading this, so they should probably weigh in here too. The current blockchain reactor design is very difficult to test and reason about, and the new design is intended to be reusable for state sync as well, so I think we want to avoid doubling down on the existing design. That said, I haven't reviewed this thoroughly yet, so I don't know how much it borrows from the existing blockchain reactor (but seems like a bunch, and I don't see any tests).
I would also think the state sync reactor should come in its own package. We can just have blockchain/state
for now but we probably want to rename blockchain
to sync
and then have sync/blockchain
and sync/state
and move the existing blockchain/store*
elsewhere. Just thinking aloud here and this shouldn't all happen in the same PR.
Finally, it would be incredible if we could do #3176 - then this implementation of state sync reactor could easily be used without a fork even if its not merged in, and other implementations could be worked on in parallel.
// TODO: reduce this by segmentation | ||
MaxStateSizeBytes = 1048576000 // 1000MB | ||
|
||
MonitorWindowInSeconds = 40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't seem to belong here. These are hard consensus params, but this is more like a reactor setting.
abci query shares rw lock with other block execution apis (begin, end, commit, deliver, check tx) and this query is very heavy according to our experience. In case of chain have 500k user, there would be 1 million keys for acc substore of app state iterate all to get num of keys would cost several seconds... we implemented cache for state sync in app level and only update the cache on syncable waypoint. The updating and query of cache shares a lock different with abci lock. |
Good idea, I can do this. |
good idea. will do this. |
closed in favor of new implementation #3594 for new proposal https://docs.google.com/document/d/1npGTAa1qxe8EQZ1wG0a0Sip9t5oX2vYZNUDwr_LVRR4/edit?usp=sharing |
For #828
This PR is still working in progress, main functionality can "work" on our app but needs testing (multiple peer edge cases and performance) and complete thread safety.
Just raise PR for community review/comments