Some questions about the evaluation metrics #53

CCCa1cai · 2021-03-28T08:07:52Z

1.On category text generation, the k_label is set to 2 then the model generats two types of text. So there are two scores of each metric in the outputs.Could you ask me how to calculate the final score?
output:
[ADV] epoch 220: temp = 1.2257, d_loss: 0.0367, BLEU-[2, 3, 4, 5] = [[0.473, 0.271, 0.174, 0.134], [0.481, 0.242, 0.159, 0.13]], NLL_gen = [0.9063, 0.8691], NLL_div = [0.4943, 0.4795], Self-BLEU-[2, 3, 4] = [[0.952, 0.884, 0.777], [0.957, 0.889, 0.796]], [PPL-F, PPL-R] = [0, 0], clas_acc = [0.719, 0.5251]
[ADV] epoch 240: temp = 1.2485, d_loss: 0.0239, BLEU-[2, 3, 4, 5] = [[0.478, 0.263, 0.175, 0.14], [0.509, 0.281, 0.183, 0.145]], NLL_gen = [0.9665, 0.9309], NLL_div = [0.4787, 0.4686], Self-BLEU-[2, 3, 4] = [[0.943, 0.864, 0.785], [0.966, 0.882, 0.784]], [PPL-F, PPL-R] = [0, 0], clas_acc = [0.7362, 0.5099]
[ADV] epoch 260: temp = 1.2706, d_loss: 0.0236, BLEU-[2, 3, 4, 5] = [[0.473, 0.283, 0.188, 0.149], [0.499, 0.262, 0.175, 0.142]], NLL_gen = [1.0321, 0.9985], NLL_div = [0.4645, 0.4584], Self-BLEU-[2, 3, 4] = [[0.952, 0.88, 0.778], [0.98, 0.923, 0.82]], [PPL-F, PPL-R] = [0, 0], clas_acc = [0.7467, 0.494]
the score in the paper:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about the evaluation metrics #53

Some questions about the evaluation metrics #53

CCCa1cai commented Mar 28, 2021

Some questions about the evaluation metrics #53

Some questions about the evaluation metrics #53

Comments

CCCa1cai commented Mar 28, 2021