sizes incorrectly parsed as powers of two by default, ignoring IEEE 1541 #4

Closed
anarcat opened this Issue Jun 24, 2015 · 10 comments

Projects

None yet

6 participants

@anarcat
anarcat commented Jun 24, 2015

i believe this is incorrect:

    >>> parse_size('42')
    42
    >>> parse_size('1 KB')
    1024
    >>> parse_size('5 kilobyte')
    5120

It should rather be:

    >>> parse_size('1 KiB')
    1024
    >>> parse_size('5 kibibyte')
    5120
    >>> parse_size('1.5 GiB')
    1610612736
    >>> parse_size('1 KB')
    1000
    >>> parse_size('5 kilobyte')
    5000
    >>> parse_size('1.5 GB')
    1000000000

i know it will complicate parsing because you can't assume that the first letter is enough to tell them apart, but maybe check the two first letters? :)

see https://en.wikipedia.org/wiki/Kibibyte for more details.

@anarcat anarcat changed the title from size parsing incorrect to sizes incorrectly parsed as powers of two by default, ignoring IEEE 1541 Jun 24, 2015
@xolox
Owner
xolox commented Jun 24, 2015

Hi @anarcat and thanks for the feedback.

You are technically 100% correct and to be honest I'm not worried about complicating the parsing, however changing this now would break backwards compatibility and I'm quite religious about that :-). There's also the question of doing what is technically correct versus the DWIM aspect.

The least I can do is clearly document the caveat you've pointed out, but I won't change the implementation as suggested. What I can do is provide a second implementation that provides the parsing you suggest (it could be a keyword argument that has to be given to the parse_size() function to switch it away from backwards compatible mode).

@anarcat
anarcat commented Jun 25, 2015

maybe having a parse_disk_size and parse_memory_size? or parse_size_si and parse_size_eic? with parse_size being an alias to parse_size_eic...

@cas--
cas-- commented Apr 11, 2016

I have to say I was discussing with another developer about using this module for our application but we were astonished to find it that it is defaulting to base 2 yet using the SI base 10 prefix, making it unusable. Your coding standards and documentation seems quite meticulous but this is quite a glaring issue regarding unit prefix standards and continuing the confusion for end users.

@skycaptain

@xolox: I'm totally agreeing on providing some sort of backwards compatibility. Let's keep in mind, that even Windows does it wrong. Providing a second function to get true binary sizes would be great.

@anarcat
anarcat commented May 4, 2016

On 2016-05-04 16:53:24, Manuel Leonhardt wrote:

@xolox: I'm totally agreeing on providing some sort of backwards compatibility. Let's keep in mind, that even Windows does it wrong. Providing a second function to get true binary sizes would be great.

"Windows does it wrong" should never be an argument against doing the
right thing.

Otherwise everything would always be wrong and we would see no
progress.

Being cynical is the only way to deal with modern civilization — you
can't just swallow it whole.
- Frank Zappa

@skycaptain
skycaptain commented May 4, 2016 edited

@anarcat: I'm totally agreeing. If it's wrong, it's wrong, period. Maybe I should have mentioned that I'm also in favour of a correct SI-formatting. I was just replying on @xolox point on DWIM.

@standage standage referenced this issue in dib-lab/khmer Jun 29, 2016
Closed

--max-memory-usage input #1259

@standage
Contributor

Any chance of this happening? It should be possible to maintain backwards compatibility of the interface with something like this, shouldn't it?

>>> parse_size('1 KB')
1024
>>> parse_size('1 KB', correctly=True)
1000
@Flimm
Contributor
Flimm commented Jul 13, 2016

I created a pull request making it clearer in the documentation which units are used: #9

@xolox xolox added a commit that referenced this issue Sep 29, 2016
@xolox Proper support for IEEE 1541 definitions of units?
Refer to issue 4 and pull requests 8 and 9 on GitHub for details:
 - #4
 - #8
 - #9
78f729b
@xolox xolox closed this in 4aa6949 Sep 29, 2016
@xolox
Owner
xolox commented Sep 29, 2016

Hi everyone, sorry for taking so long to respond but thanks to everyone who chimed in! Reading through #4, #8 and #9 have convinced me to change format_size() and parse_size() to default to powers of ten instead of powers of two.

The logic in format_size() is almost as simple as it was before, but I've enhanced parse_size() to recognize symbols and names like KB, KiB, kilobyte and kibibyte. Both functions can revert to the old behavior by passing the keyword argument binary=True.

Because these changes are backwards incompatible I bumped the major version number to 2.0. On the other hand I'm not sure what would actually break or how fatal that breakage would be, and the discussion here definitely convinced me that the default needed to change :-).

I hope the new implementation satisfies everyone's concerns! Feedback is welcome.

@anarcat
anarcat commented Sep 29, 2016

thank you so much for doing the right thing! i haven't reviewed the actual implementation in details, but the logic is what i believe is the correct way.

thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment