Question about dataset construction #16

LeePleased · 2019-08-02T02:39:27Z

Hello Yikang:

Hi~, I'm a research intern of HIT-SCIR lab, Yangming Li. It's great for your contribution about this repository. But I found some problems about the dataset construction (including test set):

1, the use of pytorch API "narrow" will unexpectedly abandon some words and result in incorrect PPL score.

2, It seems that your slide window on the whole corpus is not continuous and thus generate far less data than usual.

Great thanks again for your contribution about this repository.
Yangming, 19/08/02

yikangshen · 2019-08-02T08:26:26Z

Hi Yangming,
The problems that you mentioned are actually regularization methods introduced in AWD-LSTM (https://github.com/salesforce/awd-lstm-lm). Please refer to their paper for explanations.

LeePleased · 2019-08-02T09:35:11Z

Very thanks for your response :)

LeePleased closed this as completed Aug 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about dataset construction #16

Question about dataset construction #16

LeePleased commented Aug 2, 2019

yikangshen commented Aug 2, 2019

LeePleased commented Aug 2, 2019

Question about dataset construction #16

Question about dataset construction #16

Comments

LeePleased commented Aug 2, 2019

yikangshen commented Aug 2, 2019

LeePleased commented Aug 2, 2019