Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue.db data file padded with leading "0"s #600

Closed
lokisisland opened this issue Mar 11, 2021 · 9 comments
Closed

Queue.db data file padded with leading "0"s #600

lokisisland opened this issue Mar 11, 2021 · 9 comments
Milestone

Comments

@lokisisland
Copy link

lokisisland commented Mar 11, 2021

Describe the bug
The queue.db data file is padded with leading null characters which prevents Tile38 from starting up correctly (giving a corrupt file error). Once we remove the leading null characters manually, the data file seems to load without issue. Attached is a truncated version of the queue.db file with a .txt extension to allow upload. I have removed records from the end of the file for brevity.

To Reproduce
This occurs running in a Docker container on Azure, with the data directory mounted to an Azure file share instance. It might be related to the container being stopped and restarted.

Expected behavior
The queue.db file starts with a valid character (*).

Operating System (please complete the following information):

OS: Linux based Docker container in Azure
Version n/a

queue-broken.db.txt

@tidwall
Copy link
Owner

tidwall commented Mar 11, 2021

This is the first time I've heard of an issue where there's leading zeros in a database file.

The queue.db file is a buntdb database, which is a simple write-ahead log file. The (*) character should be the first byte.

This occurs running in a Docker container on Azure, with the data directory mounted to an Azure file share instance. It might be related to the container being stopped and restarted.

Has this issue happened more than once?

@lokisisland
Copy link
Author

Yes, it's happening intermittently at least once or twice a week. I see you've put in a fix for a similar issue (trailing zeroes in the aof file); I can probably use that as a base to fix it and raise a PR if you'd like?

@tidwall
Copy link
Owner

tidwall commented Mar 11, 2021

Yes, it's happening intermittently at least once or twice a week

Do you see the issue happening only with the queue.db file? Does it ever happen to the appenonly.aof file?

@lokisisland
Copy link
Author

I've only seen it happen with the queue.db file so far, the aof file has been fine.

@tidwall
Copy link
Owner

tidwall commented Mar 11, 2021

I need to reproduce the issue before merging a PR.

I've never encountered this problem before. Since it's occurring consistently in your environment I'll need to know for sure that it's a bug in the Tile38 codebase, and not something external such as buntdb or the host machine (docker, azure file share, etc). If it is buntdb then we should addressed the issue with that project and update the dependency in Tile38.

@lokisisland
Copy link
Author

It does seem to be an issue with BuntDb rather than Tile38; I've managed to replicate it with a very simple Go app writing to BuntDb running in a Linux based Docker container with the /data volume bound to an Azure file share. The app loops around writing an incrementing number to the database. The corruption of the file occurred when I killed Docker (rather than stopping the container). I've attached the sample app (which contains the dockerfile to build etc.).

SampleApp.zip

@tidwall tidwall added this to the 1.20.0 milestone Mar 18, 2021
tidwall added a commit to tidwall/buntdb that referenced this issue Mar 29, 2021
This commit allows for BuntDB to load data files that were
previously considered invalid or corrupted.

Now when the data file ends with an incomplete command, the data
will be truncated at the end of the previously success command.
Also when a null control character is encountered instead of an
asterix, which indicates the start of a command, the null is
ignored and the cursor moves to the next byte. This allows for
null padding at the head and the tail.

Fixes #71
tidwall/tile38#600
@tidwall
Copy link
Owner

tidwall commented Mar 30, 2021

I pushed an update to Tile38 that includes the new version of BuntDB which addresses this issue.

@tidwall tidwall closed this as completed Mar 30, 2021
@lokisisland
Copy link
Author

Thank you so much, you're awesome!

@tidwall
Copy link
Owner

tidwall commented Mar 30, 2021

You're welcome :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants