Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added two wordlists for german language #52

Closed
wants to merge 3 commits into from
Closed

Added two wordlists for german language #52

wants to merge 3 commits into from

Conversation

dataCobra
Copy link

@dataCobra dataCobra commented Sep 4, 2018

  • /diceware/wordlists/wordlist_de.txt
  • /diceware/wordlists/bigwordlist_ger.txt

+ /diceware/wordlists/wordlist_de.txt
@dataCobra dataCobra changed the title Add wordlist for german language Add two wordlists for german language Sep 4, 2018
@dataCobra dataCobra changed the title Add two wordlists for german language Added two wordlists for german language Sep 4, 2018
@dataCobra dataCobra closed this Oct 3, 2018
@ulif
Copy link
Owner

ulif commented Dec 20, 2018

Hey @dataCobra ,

My sincere apologies for not answering earlier! I got no real excuse :/

Adding a non-english wordlist is still absolutely wanted. I would, however, like to make sure, the wordlists are of "good quality", so that they can serve as the standard lists for respective languages.

To meet these goals, wordlists should be "prefix codes", i.e. no word in a wordlist should be a prefix of another word in the list. This is apparently not the case, when a wordlists contains single chars as entries ('a', 'b', 'c', ...). Then, wordlists should not contain words with less than N letters with N^c being less than w, where c is the number of different letters used in the list and w is the number of words in the list.

Example: when a list contains 7776 (= 6^5) words and all words are made of 26 different letters ('a'-'z', for instance), then the shortest words in the list should have at least length 3, because 26^2 = 676, which is less than 7776. A length of 26^3 = 17576 is sufficient, because we only need at least 7776 combinations.

Why do we need this minimum-wordlength restriction? Because otherwise passphrases could be generated, that are easier to brute-force-attack by combining letters than combining word-list elements. With 7776-element lists, that contain single letters as entries, for instance, there is a possibilty to get a 1-word passphrase "a". Bruteforcing this phrase letter-wise would take on average 26/2 tries instead of bruteforcing word-wise (where 7776/2 tries would be needed on average).

I would love to merge your wordlists but would suggest to change them to meet the above requirements before. Do you think this is possible?

Also it would be interesting to know, how you compiled the lists (what word sources did you use?) to create the lists.

Again, my apologies for not answering so long. I hope you are still interested in contributing, as additional wordlists are the most wanted feature for diceware at the moment.

Cheers

-- ulif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants