doubt regarding inputs to preprocess.py #4

ramgj28 · 2021-06-20T10:11:51Z

Hello there, first of all thank you so much for giving your code as open source so others like me can learn from it. I saw that the preprocess.py script requires many file inputs including the candidate summary. But those are generated by the model right? I couldn't find those in the data. Also in the example json file, I noticed that article untokenized and tokenized both seem to be sentence tokenized. So what is the difference?

yixinL7 · 2021-06-21T15:00:42Z

Candidate summaries are generated by a pre-trained abstractive model (in our work we use BART on CNNDM). Our code is for training the evaluation model in our paper. We've provided the preprocessed date along with the generated candidate summaries.
Untokenized text is for model input, following the requirement of RoBERTa. Tokenized data is for evaluation (computing ROUGE), following the previous work.

ramgj28 · 2021-06-22T03:24:28Z

Oh, now I get it. Thank you so much for taking your time explaining this. Really appreciate it.

ramgj28 · 2021-06-22T03:25:02Z

doubt cleared.Thanks;)

ramgj28 closed this as completed Jun 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doubt regarding inputs to preprocess.py #4

doubt regarding inputs to preprocess.py #4

ramgj28 commented Jun 20, 2021

yixinL7 commented Jun 21, 2021

ramgj28 commented Jun 22, 2021

ramgj28 commented Jun 22, 2021

doubt regarding inputs to preprocess.py #4

doubt regarding inputs to preprocess.py #4

Comments

ramgj28 commented Jun 20, 2021

yixinL7 commented Jun 21, 2021

ramgj28 commented Jun 22, 2021

ramgj28 commented Jun 22, 2021