-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite Snapshotting #4641
Rewrite Snapshotting #4641
Conversation
93d1213
to
f89f99c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of good stuff. I like the unit tests. Several smaller comments.
The important feedback is for the state controller, where I think there is a memory leak and the metrics aren't updated correctly in some cases.
atomix/cluster/src/main/java/io/atomix/raft/roles/LeaderAppender.java
Outdated
Show resolved
Hide resolved
atomix/cluster/src/main/java/io/atomix/raft/snapshot/PersistedSnapshot.java
Outdated
Show resolved
Hide resolved
atomix/cluster/src/main/java/io/atomix/raft/snapshot/PersistedSnapshot.java
Show resolved
Hide resolved
atomix/cluster/src/main/java/io/atomix/raft/snapshot/PersistedSnapshot.java
Outdated
Show resolved
Hide resolved
broker/src/main/java/io/zeebe/broker/system/partitions/impl/StateControllerImpl.java
Show resolved
Hide resolved
broker/src/main/java/io/zeebe/broker/system/partitions/impl/StateControllerImpl.java
Show resolved
Hide resolved
broker/src/main/java/io/zeebe/broker/system/partitions/impl/StateControllerImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some initial comments.
atomix/cluster/src/main/java/io/atomix/raft/roles/LeaderAppender.java
Outdated
Show resolved
Hide resolved
broker/src/main/java/io/zeebe/broker/system/partitions/impl/StateControllerImpl.java
Show resolved
Hide resolved
atomix/cluster/src/main/java/io/atomix/raft/snapshot/PersistedSnapshot.java
Outdated
Show resolved
Hide resolved
atomix/cluster/src/main/java/io/atomix/raft/snapshot/impl/FileBasedSnapshotStoreFactory.java
Show resolved
Hide resolved
atomix/cluster/src/main/java/io/atomix/raft/snapshot/impl/FileBasedSnapshotStore.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some more minor comments. 🙂
broker/src/test/java/io/zeebe/broker/system/partitions/AsyncSnapshotingTest.java
Outdated
Show resolved
Hide resolved
broker/src/test/java/io/zeebe/broker/system/partitions/AtomixSnapshotStorageTest.java
Outdated
Show resolved
Hide resolved
engine/src/test/java/io/zeebe/engine/processor/StreamProcessorReprocessingTest.java
Show resolved
Hide resolved
@pihme @deepthidevaki thanks for reviewing. I applied almost all review hints, except the interface changes. It felt like it make it more complex then necessary and I would need to do more changes. We can discuss them if necessary. Happy to hear your feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. I have just 2 more comments/questions.
atomix/cluster/src/main/java/io/atomix/raft/snapshot/impl/FileBasedSnapshotStore.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Description: Introduces new interfaces for Snapshotting. PersistedSnapshotStore can be used to take snapshots and receive snapshot chunks. After all chunks are received or the snapshot is taken the volatile snapshot can be persisted. It is always possible to abort an snapshot, before persisting. The store can also be used to get the latest snapshot which was persisted. The store allows to register listeners, which are called on new persisted snapshots. On receiving snapshot chunks and consuming them, checksums of the chunks are verified. On persisting an received snapshot the complete snapshot checksum is verified. An persisted snapshot can be read from it persisted storage via a snapshot reader and will be chunked in several snapshot chunks, which then can be transmitted and consumed again by the store. The StateSnapshotController use by the AsyncSnapshotDirector and is the only one which takes a snapshot. The ReplicationController is registered at the SnapshotStore as listener to replicate new persisted snapshots. The replication will send all chunks over the network. The ReplicationController will receive the chunks on the follower side and will use the newReceviedSnapshot method of the store to consume the SnapshotChunks. On the LeaderRole (in the AbstractAppender) the last snapshot can also be replicated. This is done via the SnapshotChunkReader, which reads the chunks from ther persisted store. The Follower (in PassiveRole) will consume the replicated chunks and will store it similar to the ReplicationController. On both ReplicationController and PassiveRole the checksum of the snapshot chunks but also of the complete snapshot is verified before the received snapshot is persisted. Related work: * Most implemation details are now in Atomix * Zeebe related snapshoting like replication and taking of snapshot is part of Broker * Use sbe for snapshot chunk replication/consumption * StreamProcessorTests moved again to engine module - removed snapshot related tests, dont need to be part of that * clean up managment schema file in broker * Wrote a bunch of new unit tests for snapshot implementation
bors r+ |
Build succeeded |
4684: chore(monitor): adjust snapshot duration panel r=Zelldon a=Zelldon ## Description Hey @deepthidevaki I missed to update the grafana dashboard in #4641 I changed in the PR the gauge to a Histogram. This PR now changes the panel in the grafana dashboard. **OLD** ![old-snapshot](https://user-images.githubusercontent.com/2758593/84110685-9fd0e200-aa25-11ea-9e10-9ba837fcf6f4.png) **NEW** ![new-snapshot](https://user-images.githubusercontent.com/2758593/84110711-a95a4a00-aa25-11ea-88ea-d26535ea6c9a.png) <!-- Please explain the changes you made here. --> # Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
* style: add carbonised skeleton to instance history * refactor: use default error message in empty panel * refactor: use separate component for error message * refactor: simplify passing props
Hey Team,
sorry for the big PR but was saw no better way to do this. I think most of the new lines are actually unit tests.
@pihme I think it is just for you to learn how it works don't need to be to check every details.
@npepinpe or @deepthidevaki who ever has time would be nice if you could take a look, especially for raft and zeebe endpoints where we use it. Hope it makes sense to you.
Happy to get your feedback.
Description
Introduces new interfaces for Snapshotting and rewrites and merge existing implementation.
PersistedSnapshotStore can be used to take snapshots and receive
snapshot chunks. After all chunks are received or the snapshot is taken
the volatile snapshot can be persisted. It is always possible
to abort an snapshot, before persisting. The store can also be used
to get the latest snapshot which was persisted. The store allows
to register listeners, which are called on new persisted snapshots.
On receiving snapshot chunks and consuming them, checksums of the chunks
are verified. On persisting an received snapshot the complete snapshot
checksum is verified.
An persisted snapshot can be read from its persisted storage via a
reader and will be chunked in several snapshot chunks, which
then can be transmitted and consumed again by the store.
The StateSnapshotController use by the AsyncSnapshotDirector and is the
only one which takes a snapshot. The ReplicationController is
registered at the SnapshotStore as listener to replicate new persisted
snapshots. The replication will send all chunks over the network.
The ReplicationController will receive the chunks on the follower side and will use
the newReceviedSnapshot method of the store to consume the SnapshotChunks.
On the LeaderRole (in the AbstractAppender) the last snapshot can also
be replicated. This is done via the SnapshotChunkReader, which
reads the chunks from the persisted store.
The Follower (in PassiveRole) will consume the replicated chunks and
will store it similar to the ReplicationController.
On both ReplicationController and PassiveRole the checksum of the
snapshot chunks but also of the complete snapshot is verified before
the received snapshot is persisted.
Related work:
part of Broker
related tests, dont need to be part of that
Class Diagram
Generate Class diagram via Intellij:
Related issues
closes #4424
closes #4604
closes #4345
closes #4122
closes #4067
closes #3214
closes #2721
closes #4425
Pull Request Checklist
mvn clean install -DskipTests
locally before committing