Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for issue 19 #20

Merged
merged 3 commits into from
Apr 20, 2016
Merged

Fix for issue 19 #20

merged 3 commits into from
Apr 20, 2016

Conversation

dwcoder
Copy link
Contributor

@dwcoder dwcoder commented Apr 17, 2016

This branch fixes #19. The tests still pass (if I understand how to use them correctly).

I am creating this pull request to master, but let me know if I should be doing it on a different branch.

Add an if to the pre_check that catches cases where there are less items
to choose from than sides on the die. If this is the case, we can still
pick items, but we have to use a modulo.
Pick the `num_rolls` so that the result generated after rolling is larger
 than `100*len(sequence)`.
The `pre_check()` function now has to return the modified `num_rolls` as
well as a boolean to indicate whether modulo needs to be applied on the
result.
It is passing and doing as expected.
@ulif
Copy link
Owner

ulif commented Apr 18, 2016

Hey @dwcoder , thank you for the PR!

I am still looking myself into the problem and are trying to understand what's going on there. You definitely found something we have to fix.

The tests you added are very helpful, thanks to all the comments.

While your code really fixes the problem, I would like to suggest to do it in a slightly different way. For this it is important to know, that the pre_check() method was meant merely as a "print-a-warning" function. No real calculations etc. should happen there, as these should be in choice() only. It should therefore not return any computational results.

In fact it looks like the computation of num_rolls in choice() was not sufficient and should be fixed so, that also sequences that are shorter than dice_sides are allowed. It was probably nonsense to forbid these sequences.

Your "modulo" approach might be right in that respect (another one would be to require num_rolls rolls again and again, until one finally gets a number < seq_len). We only need to require one more roll if the sequence length given does not match a power of dice_sides and then return the (result mod seq_len)-th element. In the pre_check, however, we might want to give a hint about this "reduced entropy".

This way, I think, we could fix the problem even better.

Would you like to improve your fix in that direction? That would be awesome!

@dwcoder
Copy link
Contributor Author

dwcoder commented Apr 18, 2016

Sure thing, I will improve my fix and we can go back and forth until all are happy. I usually have more time over the weekends, but I will see how much I can do during the week.

While your code really fixes the problem, I would like to suggest to do it in a slightly different way. For this it is important to know, that the pre_check() method was meant merely as a "print-a-warning" function. No real calculations etc. should happen there, as these should be in choice() only. It should therefore not return any computational results.

Ah, I hadn't realised. I will move the calculations out of pre_check().

Your "modulo" approach might be right in that respect (another one would be to require num_rolls rolls again and again, until one finally gets a number < seq_len).

I thought about this, however it becomes tedious if you have a large-sided die. But if we think most users will go for 6-sided ones, this will be easiest solution

We only need to require one more roll if the sequence length given does not match a power of dice_sides and then return the (result mod seq_len)-th element. In the pre_check, however, we might want to give a hint about this "reduced entropy".

This is also a solution, yes. The reason I went for the 100*len(sequence) approach is exactly due to entropy. The modulo approach is usually biased toward the first items in the sequence if the random integer being generated isn't significantly larger than the len(sequence) (the term "significant" here is up for debate 😄 ). So there is a bit of a trade off here:

  • More rolls = more entrophy & less bias.
  • Fewer rolls = ease of use & implementation.

I'll move the calculations out of the pre_check(), and change around the implementation a bit. Will keep you up to date. Once again, thanks for this project.

@ulif
Copy link
Owner

ulif commented Apr 18, 2016

Awesome! Please take your time. I also have more time over weekends.

I thought about this, however it becomes tedious if you have a large-sided die.

Absolutely, yes. The "modulo" solution looks fine to me.

This is also a solution, yes. The reason I went for the 100*len(sequence) approach is exactly due to entropy.

Ah, right, sorry for my ignorance! I'm fine with this as well.

Overall, I would expect users with n-sided dice to use appropriate n-powered wordlists. Then the entropy problem would come down to special chars, which do not give too much extra-entropy anyway. Therefore I would tend to vote for "fewer rolls", but I really do not insist.

Once again, please take your time and thank you for your contributions!

@dwcoder
Copy link
Contributor Author

dwcoder commented Apr 18, 2016

Ah, right, sorry for my ignorance! I'm fine with this as well.

No, come to think of it, your solution is actually better. For the 6-sided case -- which will by far be the most common -- a sequence of 1, 2 or 3 is trivial since they are all factors of 6. For the case of 4; it is a factor of 6^2. And for 5 is a factor of 6^2-1, so the "edge bias" will be minimal.

And even if the person uses an n-sided die, the worst the edge case will be if we always just use one extra roll is (n-1)/n^2. One could make the argument that this is still a big bias for cryptographic purposes, but your other point addresses this:

Then the entropy problem would come down to special chars, which do not give too much extra-entropy anyway. Therefore I would tend to vote for "fewer rolls", but I really do not insist.

You are right: this really is only an issue for special chars. And rolling 4 times for every special character -- which don't add much anyway -- seems tedious. So let's rather go for fewer rolls. If there are sticklers who want us to revisit this in the future, we can do so.

I have changed my code to reflect this. I will write some tests for the new case and submit my pull request.

Move the calculations out of the `pre_check()` function.
Also do smarter checks:
  - If len(sequence) == 1, we don't need any rolls. In this case, just
    return sequence[0]
  - If len(sequence) is a factor of dice_sides, we only need one roll
    along with the modulo operator.
  - For everything else, just require one extra roll, bringing it to a
    total of 2 rolls. This should remove a little bit of the edge bias
    introduced by the modulo
Run the `pre_check()` function after the calculations have been done.

For the unit tests:
  - remove the function `test_choice_len_too_short()`;
    we no longer have to raise an exception when the sequence length is
    too short.
  - Write new tests for the cases where `dice_sides = 6` and `len(sequence)`:
     - 1
     - 2,3
     - 4,5
@dwcoder
Copy link
Contributor Author

dwcoder commented Apr 19, 2016

I hope I addressed all your comments. Let me know if there are any other changes you would like me to make. And no rush if you don't get to this soon.

@ulif ulif merged commit a08516d into ulif:master Apr 20, 2016
@ulif
Copy link
Owner

ulif commented Apr 20, 2016

@dwcoder 👍 Just merged.

It's a pleasure to work with you. Thank you very much!

@dwcoder
Copy link
Contributor Author

dwcoder commented Apr 20, 2016

Likewise! Will try adding some features in the future. Again, thanks for the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error for special characters and realdice as random source
2 participants