Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocabulary of code2vec in dataset #128

Open
anon721702 opened this issue Aug 6, 2021 · 2 comments
Open

Vocabulary of code2vec in dataset #128

anon721702 opened this issue Aug 6, 2021 · 2 comments

Comments

@anon721702
Copy link

Dear Sir,

  • I would like to what exists in the java14m.dict.c2v file in dataset folder. As I am trying to see the content but its encoding is different. Please share file which has normal readable format.

  • Also, how you have created vocabulary for the same, please elaborate a little.

@urialon
Copy link
Collaborator

urialon commented Aug 14, 2021

Hi @anon721702 ,
Thank you for your interest in code2vec, and sorry for the delayed response.

You can reproduce the way that our code opens this file:
https://github.com/tech-srl/code2vec/blob/master/vocabularies.py#L75 to see its content.

You can see how this file was created here:
https://github.com/tech-srl/code2vec/blob/master/preprocess.py#L16

Best,
Uri

@urialon
Copy link
Collaborator

urialon commented Aug 14, 2021

By the way,
See our newer code2seq model's demo and code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants