Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unlink chunk with mismatched lengths status taking minutes #76

Closed
wants to merge 1 commit into from

Conversation

Fenixin
Copy link
Contributor

@Fenixin Fenixin commented Oct 4, 2014

Hello!

Just found what I think is a bug. When you try to unlink a chunk that has the status mismatched lengths, and the length attribute of the chunks is ridiculous big, it can take minutes to finish the call because of how the requiereblocks() calculate the block size.

Here is small fix for that problem.

Please, if you don't like the fix, feel free to implement it in any way you like.

@macfreek
Copy link
Collaborator

Hi @Fenixin, thanks for your report. Indeed, the loop could be smarter. I just committed a patch to my own repository: macfreek@2e168fe

I also wrote a test case for your scenario, but couldn't reproduce the slowness you reported: on my laptop, running all tests takes 1.9 seconds, before and after the patch.

Could you check if this fix, also helps your performance issue? If not, could you email me a region file that exhibited this issue?

@macfreek
Copy link
Collaborator

PS: Sorry for the slow feedback! You submitted the patch on my sons 2nd birthday, and I completely forgot about it afterwards.

@macfreek
Copy link
Collaborator

@Fenixin, let me know if you had a chance to look at the patch.

@Fenixin
Copy link
Contributor Author

Fenixin commented Oct 24, 2014

Sorry. Will do this weekend!

@Fenixin
Copy link
Contributor Author

Fenixin commented Oct 26, 2014

I won't be able to look at this until tomorrow. Sorry again.

@macfreek
Copy link
Collaborator

No problem at all! I was just curious if this patch works for you, especially since I could not reproduce your problem. I didn't mean to rush you. :)

@Fenixin
Copy link
Contributor Author

Fenixin commented Oct 28, 2014

I'm really busy right now. I will give it a look as soon as I get some time.

It seems I have lost the problematic region file :/ . If I find it I will send it to you.

@Fenixin
Copy link
Contributor Author

Fenixin commented Nov 29, 2014

Sorry for my dramatic disappearance.

Just tested your commit and I can say that it solves the issue. Thanks!

I'm sending you the (very broken) region file so you can test it yourself (if you want to). I will use your public email address in your github account.

Feel free to close this pull request and push your changes.

@macfreek
Copy link
Collaborator

macfreek commented Dec 8, 2014

@Fenixin Thanks for the file. I could not reproduce your result at first. Reading the file was about as fast before and after my fix.

However, I have a hunch on the root-cause.

What happens is that nbt.region.RegionFile attempts to read a chunk, even if it seems corrupt. This is by desgin. The idea is that it may be possible to retrieve some of the data after all. If, during the reading of data an error occurs, it raises the root causo: a corrupted chunk.

In the current code, if the headers say the chunk is 2987075640 bytes, it attempts to read 2987075640 bytes, even if the file is only 4202496 bytes. What happens next highly depends on the underlying operating systems: Python simply proxies the IO command to the underlying IO library of the OS. On my laptop, the difference also depends on the OS and exactly how the file was opened. In Python2, using the open() function, reading behind the end of the file returned immediately with an empty string. In Pyton3, or in Python2 using the io.open() function (which returns a io.BufferedReader object), reading behind the end of the file raises an IOError: [Errno 22] Invalid argument (EINVAL). I suspect that your operating system attempts to read 2987075640 (3 GByte), which took such a long time.

I think the general design is still correct: attempting to read a seemingly corrupt chunk, and only raising an exception if that fails. What I'll fix is that a read operation should not attempt to read behind the end of the file. I still have to think if we should surround the seek() and read() operations with a try/except block.

macfreek added a commit to macfreek/NBT that referenced this pull request Jan 1, 2015
This closes twoolie#76.
Make it clear that get_blockdata() always attempts to read data, even if the headers are corrupt. In nearly all cases, it will still fail, but at least it tried.
This was referenced Jan 2, 2015
@macfreek macfreek closed this in #79 Jan 4, 2015
macfreek added a commit that referenced this pull request Jan 4, 2015
@macfreek
Copy link
Collaborator

macfreek commented Jan 4, 2015

Fixed with merge #79.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants