Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sizes incorrectly parsed as powers of two by default, ignoring IEEE 1541 #4

Closed
anarcat opened this issue Jun 24, 2015 · 10 comments
Closed

Comments

@anarcat
Copy link

anarcat commented Jun 24, 2015

i believe this is incorrect:

    >>> parse_size('42')
    42
    >>> parse_size('1 KB')
    1024
    >>> parse_size('5 kilobyte')
    5120

It should rather be:

    >>> parse_size('1 KiB')
    1024
    >>> parse_size('5 kibibyte')
    5120
    >>> parse_size('1.5 GiB')
    1610612736
    >>> parse_size('1 KB')
    1000
    >>> parse_size('5 kilobyte')
    5000
    >>> parse_size('1.5 GB')
    1000000000

i know it will complicate parsing because you can't assume that the first letter is enough to tell them apart, but maybe check the two first letters? :)

see https://en.wikipedia.org/wiki/Kibibyte for more details.

@anarcat anarcat changed the title size parsing incorrect sizes incorrectly parsed as powers of two by default, ignoring IEEE 1541 Jun 24, 2015
@xolox
Copy link
Owner

xolox commented Jun 24, 2015

Hi @anarcat and thanks for the feedback.

You are technically 100% correct and to be honest I'm not worried about complicating the parsing, however changing this now would break backwards compatibility and I'm quite religious about that :-). There's also the question of doing what is technically correct versus the DWIM aspect.

The least I can do is clearly document the caveat you've pointed out, but I won't change the implementation as suggested. What I can do is provide a second implementation that provides the parsing you suggest (it could be a keyword argument that has to be given to the parse_size() function to switch it away from backwards compatible mode).

@anarcat
Copy link
Author

anarcat commented Jun 25, 2015

maybe having a parse_disk_size and parse_memory_size? or parse_size_si and parse_size_eic? with parse_size being an alias to parse_size_eic...

@cas--
Copy link

cas-- commented Apr 11, 2016

I have to say I was discussing with another developer about using this module for our application but we were astonished to find it that it is defaulting to base 2 yet using the SI base 10 prefix, making it unusable. Your coding standards and documentation seems quite meticulous but this is quite a glaring issue regarding unit prefix standards and continuing the confusion for end users.

@skycaptain
Copy link

@xolox: I'm totally agreeing on providing some sort of backwards compatibility. Let's keep in mind, that even Windows does it wrong. Providing a second function to get true binary sizes would be great.

@anarcat
Copy link
Author

anarcat commented May 4, 2016

On 2016-05-04 16:53:24, Manuel Leonhardt wrote:

@xolox: I'm totally agreeing on providing some sort of backwards compatibility. Let's keep in mind, that even Windows does it wrong. Providing a second function to get true binary sizes would be great.

"Windows does it wrong" should never be an argument against doing the
right thing.

Otherwise everything would always be wrong and we would see no
progress.

Being cynical is the only way to deal with modern civilization — you
can't just swallow it whole.
- Frank Zappa

@skycaptain
Copy link

skycaptain commented May 4, 2016

@anarcat: I'm totally agreeing. If it's wrong, it's wrong, period. Maybe I should have mentioned that I'm also in favour of a correct SI-formatting. I was just replying on @xolox point on DWIM.

@standage
Copy link
Contributor

Any chance of this happening? It should be possible to maintain backwards compatibility of the interface with something like this, shouldn't it?

>>> parse_size('1 KB')
1024
>>> parse_size('1 KB', correctly=True)
1000

@Flimm
Copy link
Contributor

Flimm commented Jul 13, 2016

I created a pull request making it clearer in the documentation which units are used: #9

xolox added a commit that referenced this issue Sep 29, 2016
Refer to issue 4 and pull requests 8 and 9 on GitHub for details:
 - #4
 - #8
 - #9
@xolox xolox closed this as completed in 4aa6949 Sep 29, 2016
@xolox
Copy link
Owner

xolox commented Sep 29, 2016

Hi everyone, sorry for taking so long to respond but thanks to everyone who chimed in! Reading through #4, #8 and #9 have convinced me to change format_size() and parse_size() to default to powers of ten instead of powers of two.

The logic in format_size() is almost as simple as it was before, but I've enhanced parse_size() to recognize symbols and names like KB, KiB, kilobyte and kibibyte. Both functions can revert to the old behavior by passing the keyword argument binary=True.

Because these changes are backwards incompatible I bumped the major version number to 2.0. On the other hand I'm not sure what would actually break or how fatal that breakage would be, and the discussion here definitely convinced me that the default needed to change :-).

I hope the new implementation satisfies everyone's concerns! Feedback is welcome.

@anarcat
Copy link
Author

anarcat commented Sep 29, 2016

thank you so much for doing the right thing! i haven't reviewed the actual implementation in details, but the logic is what i believe is the correct way.

thanks again!

Osmose pushed a commit to Osmose/socorro that referenced this issue Nov 10, 2017
Osmose pushed a commit to Osmose/socorro that referenced this issue Nov 14, 2017
Osmose pushed a commit to mozilla-services/socorro that referenced this issue Nov 16, 2017
Osmose pushed a commit to mozilla-services/socorro that referenced this issue Nov 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants