You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your awesome paper and open-source project. I recently ran into the detail of dataset pre-processing, which i can not understand properly.
In the processing of datasets you ignore all the labels exept the last one (as it is in
). This seems a bit odd as long as transformers perform causal masking inside forward and during training LM model we want to propagate loss through all of the tokens and not the last one.
Could you please explain this detail?
The text was updated successfully, but these errors were encountered:
Dear maintainers,
Thank you for your awesome paper and open-source project. I recently ran into the detail of dataset pre-processing, which i can not understand properly.
In the processing of datasets you ignore all the labels exept the last one (as it is in
AQLM/src/datautils.py
Line 51 in bf84f39
Could you please explain this detail?
The text was updated successfully, but these errors were encountered: