New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Long Sequence in SQuAD #41
Comments
We are not using cached memory for finetuning yet. Cached memory was used during pretraining to improve modeling long sequences. Once training is done, the model is better for long sequence modeling even with memory removed. Including cached memory for finetuning is also an option, but not included for now. I would suggest using the same mechanism as in pretraining but backpropagating the gradients across segments. |
Thank you! I am trying to understand how previous segments are aligned to feed into the model. Input Sentence Length: 1024 Am I correct to understand it to have 3 segments? [0:512], [256:768], [512:1024] ? Can you advice on how it long sequences can be processed to infer? Sorry for asking this which might be a foundation knowledge because I am relatively new to the architecture. Btw do you plan on releasing code on using cached memory for finetuning? |
@ecchochan The memory cache is actually origins from the Transformer-XL paper. I think you could dig into details from the paper. If I get it right, there are only two segments [0:512], [513:1024]. And the cached memory is [256:512] for the second segment though. As for the question
The cached memory aims to catch long-dependency. The problem will be stuck whether there is cached memory (just like the BERT arch, which has a fixed length 512). |
Case: SQuAD task, sequence length > 512
Does your script utilizes cached memory/extended context in a segment, such that the predictions are inferred from sequence longer than 512 tokens?
If yes, where is the code that achieves this?
If not, what do you suggest to utilize cached memory to perform QA task?
Thank you for such a great work!
The text was updated successfully, but these errors were encountered: