Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about JavaExtractor's hash function #47

Closed
utkarsh-agrawaal opened this issue Oct 15, 2019 · 4 comments
Closed

Questions about JavaExtractor's hash function #47

utkarsh-agrawaal opened this issue Oct 15, 2019 · 4 comments

Comments

@utkarsh-agrawaal
Copy link

Hi, I am trying to extend code2vec for Javascript. So far,I have been able to extract paths. I have a few questions about the final form of my_dataset.val.c2v.
What was the hash function used for paths? Did you use a standard hash function like sha1 or md5
Do you unhash the hashed string somewhere in the program?
Are the arrows in the path (up, down) really important?

@utkarsh-agrawaal utkarsh-agrawaal changed the title Questions about JavaExtractor hash function Questions about JavaExtractor's hash function Oct 15, 2019
@urialon
Copy link
Collaborator

urialon commented Oct 15, 2019

Hi,
Thank you for your interest in Code2vec.
I'm just letting you know that the encoder of code2seq is much better than code2vec's and does not use hashing. The implementation and extending to other languages is very similar.

Regarding your questions:

  1. The hash function was simply Java's String#hashCode().
  2. No, there is no need to unhash.
  3. I am guessing that the arrows contribute only a few additional points. You can definitely drop them as a first step. In our PLDI'18 paper - without arrows the results were very similar (for other tasks and another language though)

Best,
Uri

@utkarsh-agrawaal
Copy link
Author

Thank you for your prompt reply!

@utkarsh-agrawaal
Copy link
Author

So ideally I should be okay with using any hash function then?

@urialon
Copy link
Collaborator

urialon commented Oct 15, 2019

Yes! It is only intended to shorten the long path strings into shorter strings.

@urialon urialon closed this as completed Oct 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants