You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the impressive work, and especially thank you for releasing the data which is kinda hard to collect for various previous publications as there are so many variants and version.
I am interested in the Java seq2seq dataset you presented, and I am wondering what tokenization logic is used? Is it BPE or some Java-specific heuristics? Thank you!
The text was updated successfully, but these errors were encountered:
We used a regular-expression-based heuristic that splits token to subtokens. So, for example, a variable called currentIndex would be split to ['current', 'index'].
Hi Uri Alon,
Thanks for the impressive work, and especially thank you for releasing the data which is kinda hard to collect for various previous publications as there are so many variants and version.
I am interested in the Java seq2seq dataset you presented, and I am wondering what tokenization logic is used? Is it BPE or some Java-specific heuristics? Thank you!
The text was updated successfully, but these errors were encountered: