Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_lm_finetuning.py文件中的get_random_line函数 #17

Open
DDzzxiaohongdou opened this issue Oct 13, 2021 · 1 comment
Open

run_lm_finetuning.py文件中的get_random_line函数 #17

DDzzxiaohongdou opened this issue Oct 13, 2021 · 1 comment

Comments

@DDzzxiaohongdou
Copy link

def get_random_line(self):
    """
    Get random line from another document for nextSentence task.
    :return: str, content of one line
    """
    # Similar to original tf repo: This outer loop should rarely go for more than one iteration for large
    # corpora. However, just to be careful, we try to make sure that
    # the random document is not the same as the document we're processing.
    for _ in range(10): 
        if self.on_memory:
            rand_doc_idx = random.randint(0, len(self.all_docs)-1)
            rand_doc = self.all_docs[rand_doc_idx]
            line = rand_doc[random.randrange(len(rand_doc))]
        else:
            rand_index = random.randint(1, self.corpus_lines if self.corpus_lines < 1000 else 1000)
            for _ in range(rand_index):
                line = self.get_next_line()
        if self.current_random_doc != self.current_doc:
            break
    return line

请问这里for _ in range(10): 循环十遍有什么意义嘛

@tongchangD
Copy link
Owner

对不起哈,这代码很久没有用也没有维护了,对纠错感兴趣你可以过几天看看我这个项目code,当初测试,发现效果比较好。数据制作、训练、测试代码都会开源。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants