-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unlink chunk with mismatched lengths status taking minutes #76
Conversation
Hi @Fenixin, thanks for your report. Indeed, the loop could be smarter. I just committed a patch to my own repository: macfreek@2e168fe I also wrote a test case for your scenario, but couldn't reproduce the slowness you reported: on my laptop, running all tests takes 1.9 seconds, before and after the patch. Could you check if this fix, also helps your performance issue? If not, could you email me a region file that exhibited this issue? |
PS: Sorry for the slow feedback! You submitted the patch on my sons 2nd birthday, and I completely forgot about it afterwards. |
@Fenixin, let me know if you had a chance to look at the patch. |
Sorry. Will do this weekend! |
I won't be able to look at this until tomorrow. Sorry again. |
No problem at all! I was just curious if this patch works for you, especially since I could not reproduce your problem. I didn't mean to rush you. :) |
I'm really busy right now. I will give it a look as soon as I get some time. It seems I have lost the problematic region file :/ . If I find it I will send it to you. |
Sorry for my dramatic disappearance. Just tested your commit and I can say that it solves the issue. Thanks! I'm sending you the (very broken) region file so you can test it yourself (if you want to). I will use your public email address in your github account. Feel free to close this pull request and push your changes. |
@Fenixin Thanks for the file. I could not reproduce your result at first. Reading the file was about as fast before and after my fix. However, I have a hunch on the root-cause. What happens is that In the current code, if the headers say the chunk is 2987075640 bytes, it attempts to read 2987075640 bytes, even if the file is only 4202496 bytes. What happens next highly depends on the underlying operating systems: Python simply proxies the IO command to the underlying IO library of the OS. On my laptop, the difference also depends on the OS and exactly how the file was opened. In Python2, using the I think the general design is still correct: attempting to read a seemingly corrupt chunk, and only raising an exception if that fails. What I'll fix is that a read operation should not attempt to read behind the end of the file. I still have to think if we should surround the |
This closes twoolie#76. Make it clear that get_blockdata() always attempts to read data, even if the headers are corrupt. In nearly all cases, it will still fail, but at least it tried.
Fixed with merge #79. |
Hello!
Just found what I think is a bug. When you try to unlink a chunk that has the status mismatched lengths, and the length attribute of the chunks is ridiculous big, it can take minutes to finish the call because of how the requiereblocks() calculate the block size.
Here is small fix for that problem.
Please, if you don't like the fix, feel free to implement it in any way you like.