Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix possible incomplete snapshot chunk write #4358

Merged
merged 1 commit into from Apr 27, 2020
Merged

Conversation

Zelldon
Copy link
Member

@Zelldon Zelldon commented Apr 23, 2020

Description

As written in #4357 we need to loop on writing otherwise we may end in cases where your snapshot is corrupted because chunks have been written not completely.

Related guide to write to FileChannel http://tutorials.jenkov.com/java-nio/file-channel.html

Related issues

closes #4357

Pull Request Checklist

  • All commit messages match our commit message guidelines
  • The submitting code follows our code style
  • If submitting code, please run mvn clean install -DskipTests locally before committing

Copy link
Member

@npepinpe npepinpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question, why can't we use Files.write here as well again?

@Zelldon
Copy link
Member Author

Zelldon commented Apr 24, 2020

Not dump at all. I just wanted to do smallest changes. I was not sure whether here was a reason to use the SeekableChannel, furthermore not sure whether we have access to the byte array. But probably we have here no direct byte buffer.

@Zelldon
Copy link
Member Author

Zelldon commented Apr 27, 2020

was this now an request ? @npepinpe

@npepinpe
Copy link
Member

I see what you mean, I guess I'm not 100% sure if we can guarantee it's always a byte array. We probably do, but yeah not sure.

That said, it makes me realize there might be an issue with our replication. Say I want to write a chunk of X bytes, but I only write Y = X / 2 bytes, then for whatever reason it throws an IO exception. The next time I try the same chunk, we check if it exists, and skip it if it does. Could this happen then? Should we compare checksums? I guess this is a little outside the scope of the issue though.

I guess the PR, as is, is already an improvement so 👍

@Zelldon
Copy link
Member Author

Zelldon commented Apr 27, 2020

bors r+

@zeebe-bors
Copy link
Contributor

zeebe-bors bot commented Apr 27, 2020

Build succeeded

@zeebe-bors zeebe-bors bot merged commit 3d423d1 into develop Apr 27, 2020
@zeebe-bors zeebe-bors bot deleted the 4357-loop-write branch April 27, 2020 13:36
@Zelldon
Copy link
Member Author

Zelldon commented Apr 27, 2020

@npepinpe regarding the io thing. Our replication marks the snapshot as invalid if an exception was thrown, so should be fine.

@npepinpe
Copy link
Member

So the only possible case is if it crashed while looping the write?

@Zelldon
Copy link
Member Author

Zelldon commented Apr 28, 2020

Node crash you mean? Yeah maybe.

@Zelldon
Copy link
Member Author

Zelldon commented Apr 28, 2020

@npepinpe thought about this a bit more. I think then the snapshot will not be finished if the node fails, which means it will be not marked as valid so no problem.

zeebe-bors bot added a commit that referenced this pull request Apr 29, 2020
4372: fix(atomix): consume read buffer correctly r=Zelldon a=Zelldon

## Description

Fix issue where the read buffer was consumed only half and the rest was
 thrown away.

Refactor `FileChannelJournalSegmentReader#readNext` to improve readability and hopefully maintainability.
<!-- Please explain the changes you made here. -->

I run also a benchmark with these changes (#4358 was also part of the branch I run a benchmark with)

![metrics](https://user-images.githubusercontent.com/2758593/80199113-45202a00-8621-11ea-9595-e8c65130ec97.png)


## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #4248 

#

Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
zeebe-bors bot added a commit that referenced this pull request May 1, 2020
4372: fix(atomix): consume read buffer correctly r=Zelldon a=Zelldon

## Description

Fix issue where the read buffer was consumed only half and the rest was
 thrown away.

Refactor `FileChannelJournalSegmentReader#readNext` to improve readability and hopefully maintainability.
<!-- Please explain the changes you made here. -->

I run also a benchmark with these changes (#4358 was also part of the branch I run a benchmark with)

![metrics](https://user-images.githubusercontent.com/2758593/80199113-45202a00-8621-11ea-9595-e8c65130ec97.png)


## Related issues

<!-- Which issues are closed by this PR or are related -->

closes #4248 

#

Co-authored-by: Christopher Zell <zelldon91@googlemail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot chunk might not written completely
2 participants