You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello there, first of all thank you so much for giving your code as open source so others like me can learn from it. I saw that the preprocess.py script requires many file inputs including the candidate summary. But those are generated by the model right? I couldn't find those in the data. Also in the example json file, I noticed that article untokenized and tokenized both seem to be sentence tokenized. So what is the difference?
The text was updated successfully, but these errors were encountered:
Candidate summaries are generated by a pre-trained abstractive model (in our work we use BART on CNNDM). Our code is for training the evaluation model in our paper. We've provided the preprocessed date along with the generated candidate summaries.
Untokenized text is for model input, following the requirement of RoBERTa. Tokenized data is for evaluation (computing ROUGE), following the previous work.
Hello there, first of all thank you so much for giving your code as open source so others like me can learn from it. I saw that the preprocess.py script requires many file inputs including the candidate summary. But those are generated by the model right? I couldn't find those in the data. Also in the example json file, I noticed that article untokenized and tokenized both seem to be sentence tokenized. So what is the difference?
The text was updated successfully, but these errors were encountered: