about the repetition of the ground-turth #18

LHRYANG · 2022-10-25T05:57:32Z

Hi Yixuan, I have a question about the calculation of repetition rate of the ground truth. I use the code you provide:

# parse the generated results into a list of text
import json
in_f = r'./simctg_contrasive.json'
with open(in_f) as f:
    item_list = json.load(f)

text_list = []
for item in item_list:
    text = item['generated_result']['0']['continuation']
    text_list.append(text)

# compute the evaluation results
from simctg.evaluation import measure_repetition_and_diversity
rep_2, rep_3, rep_4, diversity = measure_repetition_and_diversity(text_list)
print ('The result of rep-2 is {}, rep-3 is {}, rep-4 is {}, and diversity is {}'.format(rep_2, rep_3, rep_4, round(diversity,2)))
'''
   The result of rep-2 is 3.93, rep-3 is 0.78, rep-4 is 0.31, and diversity is 0.95
'''

I can reproduce the result you reported in your paper:

The result of rep-2 is 3.93, rep-3 is 0.78, rep-4 is 0.31, and diversity is 0.95

However, when I change the code "text = item['generated_result']['0']['continuation']" to "text = item['reference_continuation_text']", it outputs

The result of rep-2 is 5.44, rep-3 is 1.28, rep-4 is 0.43, and diversity is 0.93

Which is different from the human score in your paper.

Could you help me solve this issue?
Thanks a lot!

The text was updated successfully, but these errors were encountered:

yxuansu · 2022-10-25T14:38:41Z

Hi @LHRYANG -- Thank you for your interest in our work. Have you tried to truncate the reference text to its first 128 tokens and then measure the diversity?

LHRYANG · 2022-10-26T04:00:44Z

Following your suggestion, I tried to truncate the reference text by adding two lines to your original code ((if len(token_list)>128:
token_list = token_list[0:128]))

def eval_text(text, ngram):
    token_list = text.strip().split()
    #print(len(token_list))
    if len(token_list)>128:
        token_list = token_list[0:128]
    start_idx, end_idx = 0, ngram
    total_num = 0
    ngram_set = set()
    while end_idx < len(token_list):
        one_ngram_list = token_list[start_idx:end_idx]
        assert len(one_ngram_list) == ngram
        one_ngram = ' '.join(one_ngram_list)

The output is: The result of rep-2 is 4.53, rep-3 is 1.07, rep-4 is 0.37, and diversity is 0.94, still different from that you reported.

yxuansu · 2022-10-26T10:51:11Z

Hi @LHRYANG — I will double check the results on my end. Feel free to report your replicated numbers in your work :-)

yxuansu closed this as completed Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about the repetition of the ground-turth #18

about the repetition of the ground-turth #18

LHRYANG commented Oct 25, 2022

yxuansu commented Oct 25, 2022

LHRYANG commented Oct 26, 2022

yxuansu commented Oct 26, 2022 •

edited

about the repetition of the ground-turth #18

about the repetition of the ground-turth #18

Comments

LHRYANG commented Oct 25, 2022

yxuansu commented Oct 25, 2022

LHRYANG commented Oct 26, 2022

yxuansu commented Oct 26, 2022 • edited

yxuansu commented Oct 26, 2022 •

edited