You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For implemention of #8 copynet, dataloader should change behaviours.
In our mind, there should be 3 vocab list:
For model trainning, smallest. Only include words from train set. Call it set $V.
For metric, bigger. The model will be evaluated on this vocab list, including words from train set and test set. Call it set $M. But almostly all models can't generate words from $V-$M, because they haven't seen these. Howerver, copyNet can gen words from $V-$M by copy mechanism. It's necessary to take these words into accounts when we implement metrics. $V-$M can be expressed as UNK token for some models. Dataloader have to tranlate them into a uniform distribution on $V-$M.
The whole space of word, include not seen in all the data. Call it set $N. The words in $N-$M, we don't care about them, ignore them in evaluating models, as [BUG] bug in trim_index of dataloader #37 . $N-$M is the TRUE UNK.
Require:
Change the behavior of dataloader, metric.
The text was updated successfully, but these errors were encountered:
For implemention of #8 copynet, dataloader should change behaviours.
In our mind, there should be 3 vocab list:
Require:
The text was updated successfully, but these errors were encountered: