Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance improvements #53

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Conversation

cazacugmihai
Copy link

@cazacugmihai cazacugmihai commented Feb 21, 2018

This change is Reviewable

@0x3333
Copy link
Collaborator

0x3333 commented Feb 21, 2018

Could you fix formatting, otherwise it's impossible to analyze the diff with so many changes. Thanks.

@0x3333
Copy link
Collaborator

0x3333 commented Feb 21, 2018

Also, if you could explain the performance improvements would help.

@cazacugmihai
Copy link
Author

I have fixed the formatting.

The performance consists in:

  • using operations directly to the char arrays instead of using String;
  • some operations (like computing the validChars, guardsRegExp, sepsRegExp, regexp pattern) are precomputed and cached;
  • usage of StringBuilder instead of operating with strings (see StringBuilder buffer, ret_strB, StringBuilder hash , etc.);
  • adding a helper class (CharUtils) for working with char arrays.

}

for (final long number : numbers) {
if (number < 0) {
return "";
return ""; // we must throw an exception here (like the case when we compare with MAX_NUMBER)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must throw an exception here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*See review notice.

@cazacugmihai
Copy link
Author

cazacugmihai commented Feb 21, 2018

About the performance: I have gained about 100-200 ms for running all the tests (it is at least 20% more efficient).

@0x3333
Copy link
Collaborator

0x3333 commented Feb 21, 2018

Great, I'll take some time to review and get back to you. Thanks.

- used array of primitives instead of Objects (see List<Long> ret)
for (int k = 0; k < arr.length; k++) {
arr[k] = ret.get(k);
}
// TODO remove this comment
Copy link
Author

@cazacugmihai cazacugmihai Feb 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove these comments before merging the code (it it is the case).

Copy link
Collaborator

@0x3333 0x3333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, after starting reviewing, don't get me wrong, I stopped after so many misunderstandings.

Overall, some changes are welcome, like DEFAULT_* parameters, validateAlphabet method(Except for exceptions), WORD_PATTERN, usage of char[] instead String, etc, but the fact that we adhere to the original JS implementation(https://github.com/ivanakimov/hashids.js/blob/master/lib/hashids.js) will invalidate the changes.

There is a reason we don't thrown an exception in so many cases, like unique alphabet, no numbers on encode, etc. This is because we must have the same behavior of the original implementation. I really would love to add a more "java-ish" implementation, but I believe that the behavior must follow the default expected by the user.

Well, another issue is that this PR is HUGE, we should break things in small changes, so we can review. So many changes are difficult to track the change.

pom.xml Outdated
@@ -207,5 +207,9 @@
<name>Matthias Vill</name>
<url>https://github.com/TheConstructor</url>
</contributor>
<contributor>
<name>Mihai CAZACU</name>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No caps please.1

import static java.lang.System.arraycopy;
import static java.util.Arrays.copyOf;

public final class CharUtils {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the public modifier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Is an utility class (like Math, Arrays, etc.).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this name crashes with several CharUtils out there, this is a helper class for hashids, not broad use.

}

public Hashids(String salt, int minHashLength) {
this(salt, minHashLength, DEFAULT_ALPHABET);
this((salt == null) ? null : salt.toCharArray(), minHashLength, DEFAULT_ALPHABET);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check(salt==null) should not be done here, as all the constructors are calling the same constructor, the check should be done only in the last one in the chain. I liked the idea of DEFAULT_*

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last constructor must not accept null values. It has no sense. More than that, throwing an error by assert salt != null it alerts the user about an invalid argument. A salt can be empty but not null. Null can mean that there is something wrong with your application. Also, it is widely recommended to not use null when you code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that the check can be done only in one constructor, as those other constructors are for default parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, assertions are removed at runtime unless you explicitly specify to "enable assertions".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last constructor takes an array of chars (char[] as 1st param) so I have to do here that check (salt == null).

}

public Hashids(String salt) {
this(salt, 0);
this(salt, DEFAULT_MIN_HASH_LENGTH);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call this(salt, DEFAULT_MIN_HASH_LENGTH, DEFAULT_ALPHABET); directly, no need to call a intermediate constructor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but that's why we have there a constructor with fewer arguments. The JVM will make the optimizations while we keep the code clean/small.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have the constructor with fewer arguments, you just don't need to call an intermediate constructor for the sake of simplicity.

In this case there is no optimization to be done, you call the first constructor and the VM will call the second, the third etc.

this((salt == null) ? null : salt.toCharArray(),
minHashLength,
(alphabet == null) ? null : alphabet.toCharArray());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that the private constructor could be removed. Leave the public Hashids(String salt... constructor as the "main" one and the checks remain on it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main role of the private constructor is to work with char arrays: char[] salt and char[] alphabet. Following your purpose it will lead to new variables: saltAsChars and alphabetAsChars (or something like that) which will uglify the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is to make the convertion(toCharArray) in only one place, the "last" constructor.

alphabet = alphabet.replaceAll("\\s+", "");
seps = seps.replaceAll("\\s+", "");
seps = Hashids.consistentShuffle(seps, this.salt);
seps = Hashids.consistentShuffle(seps, salt);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always prefix methods/attributes with this..

Copy link
Author

@cazacugmihai cazacugmihai Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is not a JAVA rule. Perhaps a JS one. Please check any class implementation of JDK.

}

return this._decode(hash, this.alphabet);
return _decode(hash, alphabet);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always prefix methods/attributes with this.

Copy link
Author

@cazacugmihai cazacugmihai Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is not a JAVA rule. Perhaps a JS one. Please check any class implementation of JDK.

@@ -197,8 +213,7 @@ public String encodeHex(String hexa) {
/**
* Decode string to numbers
*
* @param hash
* the encoded string
* @param hash the encoded string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respect the new line after paramater name.

Copy link
Author

@cazacugmihai cazacugmihai Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is not a JAVA rule. Perhaps a JS one. Please check any class implementation of JDK.

num %= (last.charAt(0) + i);
sepsIndex = (int) (num % this.seps.length());
num %= last.charAt(0) + i;
sepsIndex = (int) (num % seps.length);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always prefix methods/attributes with this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is not a JAVA rule. Perhaps a JS one.

if (ret_str.length() < this.minHashLength) {
guardIndex = (numberHashInt + (ret_str.charAt(2))) % this.guards.length();
guard = this.guards.charAt((int) guardIndex);
if (ret_strB.length() < minHashLength) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always prefix methods/attributes with this.

Copy link
Author

@cazacugmihai cazacugmihai Feb 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This is not a JAVA rule. Perhaps a JS one. Please check any class implementation of JDK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a java rule, this is our convention rule, all call to instance members are prefixed with this. Read our code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is pretty hard to add your contribution to the community if every project has its standards. This is why there are Java code conventions (by Sun, Oracle, Google - which are quite similar).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are just not getting it. Why making the code a mess, if you can follow just simple rules? It's not hard to make your code look like the same as everybody else.
They are quite similar, but the are different, plain simple.

Don't get me wrong, I'd love to merge your contribution, but it is not in a way to be merged. As I said in another issue report, this project has a long history, we just can't change it and deploy, several people are using it. What we can do is work on a new major version, looking to integrate better with java idiom, faster performance, etc.

@cazacugmihai
Copy link
Author

About breaking this pull request in smaller pieces: I can't do this because the most part of it is related to the migration from String to char[]. I may extract the constants and the validation method but it will not help much reviewing the code (because those changes are only a couple of lines).

pom.xml Outdated
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.9</version>
<scope>test</scope>
</dependency>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you add this dependency?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. I have run a benchmark to check the performance. Thanks for seeing it! :)

@0x3333
Copy link
Collaborator

0x3333 commented Feb 28, 2018

@cazacugmihai what do you think about creating a branch and work on a new version? @gabrielhora you are invited to this discussion.
I'll create an issue to work on that.

@cazacugmihai
Copy link
Author

Sure, I can do that.

@0x3333
Copy link
Collaborator

0x3333 commented Feb 28, 2018

Issue 55 - V2 Proposal. Let's discuss there before starting to code.

@cazacugmihai cazacugmihai mentioned this pull request Feb 28, 2018
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants