Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- Knock-knock? - Who's there? - Broken segment! #973

Closed
ppodolsky opened this issue Jan 8, 2021 · 15 comments
Closed

- Knock-knock? - Who's there? - Broken segment! #973

ppodolsky opened this issue Jan 8, 2021 · 15 comments

Comments

@ppodolsky
Copy link
Contributor

ppodolsky commented Jan 8, 2021

Describe the bug
The same load profile as in #969 - deletions, addings and mergings.
Now it happens on querying after several hours of serving.
I think the reason is basically the same. At startup and during several hours afterwards all queries were ok but after generations of merges searcher.doc started to throw VInt decoding error.

Which version of tantivy are you using?
bf6e6e8

To Reproduce
Sent broken segment to you in gitter.

@ppodolsky
Copy link
Contributor Author

ppodolsky commented Jan 8, 2021

Failing at https://github.com/tantivy-search/tantivy/blob/main/src/store/reader.rs#L104
chechpoint (doc=[14958..16689), bytes=[3471326..3478611)), doc_id - 15086

@fulmicoton
Copy link
Collaborator

fulmicoton commented Jan 8, 2021 via email

@fulmicoton
Copy link
Collaborator

fulmicoton commented Jan 9, 2021

[14892..14909), bytes=[3453499..3456556))
(doc=[14909..14926), bytes=[3456556..3460848))
(doc=[14926..14942), bytes=[3460848..3464857))
(doc=[14942..14958), bytes=[3464857..3471326))
(doc=[14958..16689), bytes=[3471326..3478611)) <---
(doc=[16689..16724), bytes=[3478611..3486278)) 
(doc=[16724..16753), bytes=[3486278..3493484))
(doc=[16753..16787), bytes=[3493484..3500905))
(doc=[15087..15131), bytes=[3500905..3508456)) <---
(doc=[15131..15165), bytes=[3508456..3516084))
(doc=[15165..15196), bytes=[3516084..3523442))
(doc=[15196..15228), bytes=[3523442..3530761))
(doc=[15228..15256), bytes=[3530761..3538043))

The bug looks very similar.

@fulmicoton
Copy link
Collaborator

fulmicoton commented Jan 9, 2021

Did you enable logging (warn level should be sufficient) and did you see a lot of merge fail before that?

I'd like to know if the assert in block.rs:l.47 triggered several times before you encounterred your problem.

@ppodolsky
Copy link
Contributor Author

ppodolsky commented Jan 9, 2021 via email

@ppodolsky
Copy link
Contributor Author

ppodolsky commented Jan 9, 2021 via email

@fulmicoton
Copy link
Collaborator

@ppodolsky just to be sure, this is a brand new index.. meaning it did not contain segment that would have been corrupted previously?

@ppodolsky
Copy link
Contributor Author

@ppodolsky just to be sure, this is a brand new index.. meaning it did not contain segment that would have been corrupted previously?

Yep. I have rebuilt the whole index after applying latest commits from your main branch. I will recheck everything today and launch writings with enabled logging if it is required. Looks like I will be able to reproduce the issue quickly.

@fulmicoton
Copy link
Collaborator

fulmicoton commented Jan 11, 2021

Can you run your program with the following rev?
acfb057

It checks the doc store skip index while it is being written. If there is a problem, it detects it and return an error.
tantivy then abruptly quit the process and logs the segments that were being merged.

The segment files are not removed so if you send them to me, I should be able to look at the issue.
(the .store files are sufficient I think)

@ppodolsky

@ppodolsky
Copy link
Contributor Author

Sure, I will release this rev today. During last weekend nothing happened (but write load was lesser than usual). I continue to observe and write logs. Will keep you informed.

@fulmicoton
Copy link
Collaborator

Thank you!

@ppodolsky
Copy link
Contributor Author

Still having no luck in the catch. I've begun to doubt in sanity of what was there, probably I or k8s had managed to launch previous version of Tantivy for a moment and it'd corrputed segment.

To excuse I'd like to say that during 3 days under rather heavy load there is not any corruption. I'm keeping watching with logging til the end of week and then will close the issue if won't find anything. Highly likely everything is OK and I've false-alarmed, sorry.

@fulmicoton
Copy link
Collaborator

No worries! You have accumulated enough good Kharma by finding and spending time reporting the bug not to worry about that :)

@ppodolsky
Copy link
Contributor Author

Didn't get the corruption, so it was definitely my mistake. Under two weeks of various load profiles there have been no any signs of broken segs. Thank you for being patient :)

@fulmicoton
Copy link
Collaborator

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants