Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR type
使用deepseek-ai/deepseek-coder-6.7b-base模型做grpo, 发现template_meta_prefix error,#4808
问题表现同 #4785
当前错误代码错误在swift/llm/template/base.py的_swift_encode函数,涉及的代码行如下:
PR information
debug发现prefix是 [[32013]], 对应的token是‘<|begin▁of▁sentence|>’,导致这个问题的原因在swift/llm/template/template_meta.py的init()函数的执行,将prefix这类值转成tokenid后替换该属性的值。
为了适配完整而正确的prompt,需要将token_id=32013,decode回‘<|begin▁of▁sentence|>’,这样行程的prompt为:‘<|begin▁of▁sentence|>User:xx\nAssitant:xx’
解法:为了尽可能的不影响grpo意外的功能,将修改base的地方改成修改grpo相关文件,具体见变更
测试脚本,just replace your_own_path in --model, --output_dir, you can test: