New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make the EFF word list the default? #33

Closed
anarcat opened this Issue Feb 6, 2017 · 22 comments

Comments

Projects
None yet
4 participants
@anarcat

anarcat commented Feb 6, 2017

at first glance, it sure looks like the EFF word list is superior to the other alternatives. is there any reason why it's not the default list?

@ulif ulif self-assigned this Feb 7, 2017

@ulif ulif added the enhancement label Feb 7, 2017

@ulif

This comment has been minimized.

Show comment
Hide comment
@ulif

ulif Feb 7, 2017

Owner

The other lists were included before the EFF list. The current default, securedrop was provided by the securedrop project and they manually crafted their list for diceware. Therefore I feel a bit indebted to them.

But in the long run the EFF list would certainly make a better default. I will talk to the securedrop people if they can live with it.

In fact I just waited for someone else to ask for EFF as default, so, thanks for this!

Owner

ulif commented Feb 7, 2017

The other lists were included before the EFF list. The current default, securedrop was provided by the securedrop project and they manually crafted their list for diceware. Therefore I feel a bit indebted to them.

But in the long run the EFF list would certainly make a better default. I will talk to the securedrop people if they can live with it.

In fact I just waited for someone else to ask for EFF as default, so, thanks for this!

@heartsucker

This comment has been minimized.

Show comment
Hide comment
@heartsucker

heartsucker Feb 17, 2017

Contributor

Oh hey @ulif. I'm get guy who made that list for SecureDrop. SD moved away from the list I made to the EFF's list with this pr: freedomofpress/securedrop#1452. On those grounds (among others), I'd say it's probably worth it to use the EFF's list over the one I made.

Contributor

heartsucker commented Feb 17, 2017

Oh hey @ulif. I'm get guy who made that list for SecureDrop. SD moved away from the list I made to the EFF's list with this pr: freedomofpress/securedrop#1452. On those grounds (among others), I'd say it's probably worth it to use the EFF's list over the one I made.

@ulif

This comment has been minimized.

Show comment
Hide comment
@ulif

ulif Feb 17, 2017

Owner

Hey @heartsucker , great to hear from you! Thank you for the feedback! If you can live with it, I would say we switch to the EFF list as default as well.

FWIW I started another little project to check existing wordlists for diceware related flaws. Yet it is not really usable:

https://github.com/ulif/diceware-list

This Python project should generate two CLI scripts: a wordlist creator (diceware-list) and a wordlist checker (wlflakes). Yet not really usable but maybe helpful in the long run.

Owner

ulif commented Feb 17, 2017

Hey @heartsucker , great to hear from you! Thank you for the feedback! If you can live with it, I would say we switch to the EFF list as default as well.

FWIW I started another little project to check existing wordlists for diceware related flaws. Yet it is not really usable:

https://github.com/ulif/diceware-list

This Python project should generate two CLI scripts: a wordlist creator (diceware-list) and a wordlist checker (wlflakes). Yet not really usable but maybe helpful in the long run.

@heartsucker

This comment has been minimized.

Show comment
Hide comment
@heartsucker

heartsucker Feb 17, 2017

Contributor

@ulif Yeah, by all means. Make the switch.

Contributor

heartsucker commented Feb 17, 2017

@ulif Yeah, by all means. Make the switch.

@ulif

This comment has been minimized.

Show comment
Hide comment
@ulif

ulif Feb 17, 2017

Owner

Ooh, just spotted: the EFF list is len 7776, not 8192. Nice for dice, bad for computers. That changes the game. IIRC there are no 2^n list available from EFF. Hm, have to reconsider.

Owner

ulif commented Feb 17, 2017

Ooh, just spotted: the EFF list is len 7776, not 8192. Nice for dice, bad for computers. That changes the game. IIRC there are no 2^n list available from EFF. Hm, have to reconsider.

@heartsucker

This comment has been minimized.

Show comment
Hide comment
@heartsucker

heartsucker Feb 17, 2017

Contributor

Well either way, don't let any indebtedness influence your decision. :)

Contributor

heartsucker commented Feb 17, 2017

Well either way, don't let any indebtedness influence your decision. :)

@ulif

This comment has been minimized.

Show comment
Hide comment
@ulif

ulif Feb 17, 2017

Owner

Thank you very much, @heartsucker , well, I feel indebted to you anyway.

Owner

ulif commented Feb 17, 2017

Thank you very much, @heartsucker , well, I feel indebted to you anyway.

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat Feb 17, 2017

hmm... this seems like a math problem we should be able to solve now... ie. how to map a X bit entropy source to a Y bit space... why can't we just mod(7776) the entropy source?

anarcat commented Feb 17, 2017

hmm... this seems like a math problem we should be able to solve now... ie. how to map a X bit entropy source to a Y bit space... why can't we just mod(7776) the entropy source?

@clawoflight

This comment has been minimized.

Show comment
Hide comment
@clawoflight

clawoflight Feb 17, 2017

@anarcat I think just using mod would change the distribution (because the lower numbers would get the ones above 7776 also, changing the probabilities, right?)
I don't trust my math for security issues, though....

@ulif You could make the EFF list default when using real dice, and keep the previous one for urandom. That would barely increase complexity and be much safer, right?

clawoflight commented Feb 17, 2017

@anarcat I think just using mod would change the distribution (because the lower numbers would get the ones above 7776 also, changing the probabilities, right?)
I don't trust my math for security issues, though....

@ulif You could make the EFF list default when using real dice, and keep the previous one for urandom. That would barely increase complexity and be much safer, right?

@heartsucker

This comment has been minimized.

Show comment
Hide comment
@heartsucker

heartsucker Feb 17, 2017

Contributor

@ulif @anarcat

Just an FYI, there is less than 0.5 bits of entropy difference between a list of size 7776 and 8192 at 6 words and less than 1 at 8 words.

>>> import math
>>> math.log(7776**6, 2)
77.54887502163469
>>> math.log(8192**6, 2)
78.0000000000000

While @ulif is right that 8192 is better, it is only just barely better for human use. The actual SecureDrop list is even shorter that 7776 because we removed "scary" words from the list in this PR: freedomofpress/securedrop#1546. Even this was considered to be ok since the EFF list was considered more memorizable and still secure.

Contributor

heartsucker commented Feb 17, 2017

@ulif @anarcat

Just an FYI, there is less than 0.5 bits of entropy difference between a list of size 7776 and 8192 at 6 words and less than 1 at 8 words.

>>> import math
>>> math.log(7776**6, 2)
77.54887502163469
>>> math.log(8192**6, 2)
78.0000000000000

While @ulif is right that 8192 is better, it is only just barely better for human use. The actual SecureDrop list is even shorter that 7776 because we removed "scary" words from the list in this PR: freedomofpress/securedrop#1546. Even this was considered to be ok since the EFF list was considered more memorizable and still secure.

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat Feb 17, 2017

@anarcat I think just using mod would change the distribution (because the lower numbers would get the ones above 7776 also, changing the probabilities, right?)
I don't trust my math for security issues, though....

well, when i mean modulo, i really mean to discard the reminder of a modulo division, or whatever... of course if you shuffle those discarded numbers back at the top you screw up the distribution, but if you discard the upper (say) 9 bits, you end up with a subset of the list that is 7680 words long and should be uniform.

but i'm not a certified cryptographer either (and certainly not a stats geek :p) so take that with a grain of salt.

anarcat commented Feb 17, 2017

@anarcat I think just using mod would change the distribution (because the lower numbers would get the ones above 7776 also, changing the probabilities, right?)
I don't trust my math for security issues, though....

well, when i mean modulo, i really mean to discard the reminder of a modulo division, or whatever... of course if you shuffle those discarded numbers back at the top you screw up the distribution, but if you discard the upper (say) 9 bits, you end up with a subset of the list that is 7680 words long and should be uniform.

but i'm not a certified cryptographer either (and certainly not a stats geek :p) so take that with a grain of salt.

@ulif

This comment has been minimized.

Show comment
Hide comment
@ulif

ulif Feb 17, 2017

Owner

@anarcat, I think @clawoflight is right. Oh, and you are right too, I think, if you discard bits. In fact that means you should simply discard values from random generators which are out of your scope (here: 1..7776).

As random generators will normally give you 1 or 2 or N bits, you can create numbers in the scope 0..2^N-1. But I think you cannot "map" or "move" "unused" bits to the next number, "reuse" them or similar (without making the result "unfair" or less equally distributed). I must confess, @anarcat, I did not get how you got the number 7680 (but I am neither a certified cryptographer).

The current implementation, however, copes with shorter wordlists already. So, you can already use the EFF list with "system" random. I should only check, whether this is done correctly (i.e. no mod(), instead discard out-of-scope values).

Also @heartsucker is right: the entropy difference is not too much, about 0.0752 bits per word. It is just: I simply hate wasting entropy ;-)

Thank you very much @anarcat, @clawoflight, and @heartsucker ! That was really helpful :D I actually tend to make EFF the default list and will close this issue with a respective commit later today.

Owner

ulif commented Feb 17, 2017

@anarcat, I think @clawoflight is right. Oh, and you are right too, I think, if you discard bits. In fact that means you should simply discard values from random generators which are out of your scope (here: 1..7776).

As random generators will normally give you 1 or 2 or N bits, you can create numbers in the scope 0..2^N-1. But I think you cannot "map" or "move" "unused" bits to the next number, "reuse" them or similar (without making the result "unfair" or less equally distributed). I must confess, @anarcat, I did not get how you got the number 7680 (but I am neither a certified cryptographer).

The current implementation, however, copes with shorter wordlists already. So, you can already use the EFF list with "system" random. I should only check, whether this is done correctly (i.e. no mod(), instead discard out-of-scope values).

Also @heartsucker is right: the entropy difference is not too much, about 0.0752 bits per word. It is just: I simply hate wasting entropy ;-)

Thank you very much @anarcat, @clawoflight, and @heartsucker ! That was really helpful :D I actually tend to make EFF the default list and will close this issue with a respective commit later today.

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat Feb 17, 2017

here are my numbers:

  • securedrop wordlist length: 2^13 = 8192
  • original diceware word list length: 7776
  • 8 bits: 2^8 = 256
  • 9 bits: 2^9 = 512
  • stripping upper 8 bits: (2^13) - (2^8) = 7936
  • stripping upper 9 bits: (2^13) - (2^9) = 7680

Stripping 8 bits still gives us too many combinations. Stripping 9 bits, not enough. But we just need to reduce the length of the list to 7680, ie. remove 96 words, and we are spot on.

Hopefully someone will confirm my math :)

anarcat commented Feb 17, 2017

here are my numbers:

  • securedrop wordlist length: 2^13 = 8192
  • original diceware word list length: 7776
  • 8 bits: 2^8 = 256
  • 9 bits: 2^9 = 512
  • stripping upper 8 bits: (2^13) - (2^8) = 7936
  • stripping upper 9 bits: (2^13) - (2^9) = 7680

Stripping 8 bits still gives us too many combinations. Stripping 9 bits, not enough. But we just need to reduce the length of the list to 7680, ie. remove 96 words, and we are spot on.

Hopefully someone will confirm my math :)

@heartsucker

This comment has been minimized.

Show comment
Hide comment
@heartsucker

heartsucker Feb 17, 2017

Contributor

In the source code here, the call is delegated to SystemRandom and we're calling it via rnd.choice(some_list). This will give a uniform random distribution for any size list. There doesn't need to be any mucking about with doing modular arithmetic or discarding values.

Contributor

heartsucker commented Feb 17, 2017

In the source code here, the call is delegated to SystemRandom and we're calling it via rnd.choice(some_list). This will give a uniform random distribution for any size list. There doesn't need to be any mucking about with doing modular arithmetic or discarding values.

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat Feb 17, 2017

right, that was my impression as well - i would also assume this is secure, but i'd be curious to see if the distribution is actually uniform...

anarcat commented Feb 17, 2017

right, that was my impression as well - i would also assume this is secure, but i'd be curious to see if the distribution is actually uniform...

@heartsucker

This comment has been minimized.

Show comment
Hide comment
@heartsucker

heartsucker Feb 17, 2017

Contributor

The docs here say that SystemRandom uses os.urandom() which is crytpographically secure. This means that it is uniformly distributed. If we're at the point where we are asking if /dev/urandom is secure or not for random numbers, we are way outside the scope of the original question which is "give a list of N elements, how can we select M in a cryptographically secure way?"

Contributor

heartsucker commented Feb 17, 2017

The docs here say that SystemRandom uses os.urandom() which is crytpographically secure. This means that it is uniformly distributed. If we're at the point where we are asking if /dev/urandom is secure or not for random numbers, we are way outside the scope of the original question which is "give a list of N elements, how can we select M in a cryptographically secure way?"

@clawoflight

This comment has been minimized.

Show comment
Hide comment
@clawoflight

clawoflight Feb 17, 2017

The question is not whether SystemRandom is secure, but whether Random.choice() maintains a sufficiently uniform distribution for cryptographic purposes.

clawoflight commented Feb 17, 2017

The question is not whether SystemRandom is secure, but whether Random.choice() maintains a sufficiently uniform distribution for cryptographic purposes.

@heartsucker

This comment has been minimized.

Show comment
Hide comment
@heartsucker

heartsucker Feb 17, 2017

Contributor

I think it's safe to say that the devs of the python core library have ensured that SystemRandom does a secure job of randomly selecting elements from a list. Straying from this implementation would get into the very dangerous territory of "rolling your own crypto."

Contributor

heartsucker commented Feb 17, 2017

I think it's safe to say that the devs of the python core library have ensured that SystemRandom does a secure job of randomly selecting elements from a list. Straying from this implementation would get into the very dangerous territory of "rolling your own crypto."

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat Feb 17, 2017

well, here's the source in any case... random.choice calls random._randbelow and, if i read this right, this ends up calling SystemRandom.getrandbits() which does the right thing:

    def getrandbits(self, k):
        """getrandbits(k) -> x.  Generates an int with k random bits."""
        if k <= 0:
            raise ValueError('number of bits must be greater than zero')
        if k != int(k):
            raise TypeError('number of bits should be an integer')
        numbytes = (k + 7) // 8                       # bits / 8 and rounded up
        x = int.from_bytes(_urandom(numbytes), 'big')
        return x >> (numbytes * 8 - k)                # trim excess bits

I'm not absolutely certain randbelow yields a uniform distribution, however:

    def _randbelow(self, n, int=int, maxsize=1<<BPF, type=type,
                   Method=_MethodType, BuiltinMethod=_BuiltinMethodType):
        "Return a random int in the range [0,n).  Raises ValueError if n==0."

        random = self.random
        getrandbits = self.getrandbits
        # Only call self.getrandbits if the original random() builtin method
        # has not been overridden or if a new getrandbits() was supplied.
        if type(random) is BuiltinMethod or type(getrandbits) is Method:
            k = n.bit_length()  # don't use (n-1) here because n can be 1
            r = getrandbits(k)          # 0 <= r < 2**k
            while r >= n:
                r = getrandbits(k)
            return r

is that retry loop (while >= n) correct? i guess it just discards stuff that's too large so that should be okay...

then i'll let you wonder if this 3.6 implementation was the same in Python 2.7 (or earlier!!). ;)

anarcat commented Feb 17, 2017

well, here's the source in any case... random.choice calls random._randbelow and, if i read this right, this ends up calling SystemRandom.getrandbits() which does the right thing:

    def getrandbits(self, k):
        """getrandbits(k) -> x.  Generates an int with k random bits."""
        if k <= 0:
            raise ValueError('number of bits must be greater than zero')
        if k != int(k):
            raise TypeError('number of bits should be an integer')
        numbytes = (k + 7) // 8                       # bits / 8 and rounded up
        x = int.from_bytes(_urandom(numbytes), 'big')
        return x >> (numbytes * 8 - k)                # trim excess bits

I'm not absolutely certain randbelow yields a uniform distribution, however:

    def _randbelow(self, n, int=int, maxsize=1<<BPF, type=type,
                   Method=_MethodType, BuiltinMethod=_BuiltinMethodType):
        "Return a random int in the range [0,n).  Raises ValueError if n==0."

        random = self.random
        getrandbits = self.getrandbits
        # Only call self.getrandbits if the original random() builtin method
        # has not been overridden or if a new getrandbits() was supplied.
        if type(random) is BuiltinMethod or type(getrandbits) is Method:
            k = n.bit_length()  # don't use (n-1) here because n can be 1
            r = getrandbits(k)          # 0 <= r < 2**k
            while r >= n:
                r = getrandbits(k)
            return r

is that retry loop (while >= n) correct? i guess it just discards stuff that's too large so that should be okay...

then i'll let you wonder if this 3.6 implementation was the same in Python 2.7 (or earlier!!). ;)

@ulif

This comment has been minimized.

Show comment
Hide comment
@ulif

ulif Feb 18, 2017

Owner

Wow, this is escalating... in a nice way :)

Overall it looks like SystemRandom.choice() works correctly for our purposes, right? Certainly both parts are important: SystemRandom should deliver nearly-unpredictable bits, which choice() should distribute equally over the words of our lists. This seems indeed to be the case. Thank you @clawoflight and @heartsucker for your comments!

Thank you, @anarcat for looking into the sources! One should really do that from time to time.

The py2.7 sources BTW, have a different choice() implementation:

    def choice(self, seq):
        """Choose a random element from a non-empty sequence."""
        return seq[int(self.random() * len(seq))]  # raises IndexError if seq is empty

Here self.random() is a C function if I understand it correctly, that gives a float in range [0, 1]. If that works correctly, then the 2.7 implementation should work correctly as well. But as I have my problems with float representations in C (well, frankly, even in Python), I cannot judge the correctness or fitness of the function for our purposes.

At least I could not find critical flaws when generating respective numbers for some time tonight with the py27 choice() method. Of course that might be examined better with dieharder or similar tools, but for now I am satisfied with my little ad-hoc tests.

Overall, I am glad they look okay for our purposes. Yes, @heartsucker, rolling own crypto would be really dangerous.

Thank you @heartsucker, @clawoflight, and @anarcat for your valuable input! I will make the EFF list now the default.

Owner

ulif commented Feb 18, 2017

Wow, this is escalating... in a nice way :)

Overall it looks like SystemRandom.choice() works correctly for our purposes, right? Certainly both parts are important: SystemRandom should deliver nearly-unpredictable bits, which choice() should distribute equally over the words of our lists. This seems indeed to be the case. Thank you @clawoflight and @heartsucker for your comments!

Thank you, @anarcat for looking into the sources! One should really do that from time to time.

The py2.7 sources BTW, have a different choice() implementation:

    def choice(self, seq):
        """Choose a random element from a non-empty sequence."""
        return seq[int(self.random() * len(seq))]  # raises IndexError if seq is empty

Here self.random() is a C function if I understand it correctly, that gives a float in range [0, 1]. If that works correctly, then the 2.7 implementation should work correctly as well. But as I have my problems with float representations in C (well, frankly, even in Python), I cannot judge the correctness or fitness of the function for our purposes.

At least I could not find critical flaws when generating respective numbers for some time tonight with the py27 choice() method. Of course that might be examined better with dieharder or similar tools, but for now I am satisfied with my little ad-hoc tests.

Overall, I am glad they look okay for our purposes. Yes, @heartsucker, rolling own crypto would be really dangerous.

Thank you @heartsucker, @clawoflight, and @anarcat for your valuable input! I will make the EFF list now the default.

@ulif ulif closed this in 0a261a1 Feb 18, 2017

@anarcat

This comment has been minimized.

Show comment
Hide comment
@anarcat

anarcat Feb 18, 2017

i wouldn't bet my life on "self.random()", it sure looks like it's taking stuff from a (python) internal RNG, and not from the system 'getrandom' CRNG.... but i have invested too much time in this already... i do strongly encourage you to test choice for flaws, especially in 2.7.

anarcat commented Feb 18, 2017

i wouldn't bet my life on "self.random()", it sure looks like it's taking stuff from a (python) internal RNG, and not from the system 'getrandom' CRNG.... but i have invested too much time in this already... i do strongly encourage you to test choice for flaws, especially in 2.7.

@ulif

This comment has been minimized.

Show comment
Hide comment
@ulif

ulif Feb 18, 2017

Owner

Thanks @anarcat for your time! I learned new tricks, we all got a new default wordlist and I looked into the C-code of Python random implementations for the first time :)

I was wrong with the C implementation of random() in py2.7. In fact we use SystemRandom which provides an own Python method for random() in py2.7:

    def random(self):
        """Get the next random number in the range [0.0, 1.0)."""
        return (long(_hexlify(_urandom(7)), 16) >> 3) * RECIP_BPF

This one apparently gets random bits from /dev/urandom. I am not sure about the distribution, though.

Owner

ulif commented Feb 18, 2017

Thanks @anarcat for your time! I learned new tricks, we all got a new default wordlist and I looked into the C-code of Python random implementations for the first time :)

I was wrong with the C implementation of random() in py2.7. In fact we use SystemRandom which provides an own Python method for random() in py2.7:

    def random(self):
        """Get the next random number in the range [0.0, 1.0)."""
        return (long(_hexlify(_urandom(7)), 16) >> 3) * RECIP_BPF

This one apparently gets random bits from /dev/urandom. I am not sure about the distribution, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment