Aacedar patch 3 #4832

aacedar · 2025-07-04T09:20:56Z

PR type

使用deepseek-ai/deepseek-coder-6.7b-base模型做grpo，发现template_meta_prefix error，#4808
问题表现同 #4785
当前错误代码错误在swift/llm/template/base.py的_swift_encode函数，涉及的代码行如下：


if self.template_meta.is_post_system or not system:

    prefix = template_meta.prefix

else:
    prefix = template_meta.system_prefix
self._concat_context_list(prefix, res_context_list, res_context_types, system=system)

PR information

debug发现prefix是 [[32013]], 对应的token是‘<｜begin▁of▁sentence｜>’，导致这个问题的原因在swift/llm/template/template_meta.py的init()函数的执行，将prefix这类值转成tokenid后替换该属性的值。
为了适配完整而正确的prompt，需要将token_id=32013,decode回‘<｜begin▁of▁sentence｜>’，这样行程的prompt为：‘<｜begin▁of▁sentence｜>User:xx\nAssitant:xx’

解法：为了尽可能的不影响grpo意外的功能，将修改base的地方改成修改grpo相关文件，具体见变更

测试脚本,just replace your_own_path in --model, --output_dir, you can test：

    CUDA_VISIBLE_DEVICES=7 \
    swift rollout \
      --model your_own_path/deepseek-ai/deepseek-coder-6.7b-base \ 
     > logs/log-grpo-rollout.log 2>&1 &

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
    NPROC_PER_NODE=6 \
    swift rlhf \
      --rlhf_type grpo \
      --model your_own_path/deepseek-ai/deepseek-coder-6.7b-base \
      --reward_funcs accuracy \
      --use_vllm true \
      --vllm_mode server \
      --vllm_server_host 127.0.0.1 \
      --vllm_server_port 8000 \
      --train_type full \
      --torch_dtype bfloat16 \
      --dataset AI-MO/NuminaMath-TIR#1000 \
      --split_dataset_ratio 0 \
      --max_completion_length 512 \
      --num_train_epochs 1 \
      --per_device_train_batch_size 2 \
      --learning_rate 1e-6 \
      --gradient_accumulation_steps 2 \
      --save_total_limit 2 \
      --logging_steps 1 \
      --deepspeed zero2 \
      --max_length 4096 \
      --warmup_ratio 0.05 \
      --dataloader_num_workers 2 \
      --dataset_num_proc 4 \
      --num_generations 6 \
      --temperature 0.9 \
      --top_p 0.9 \
      --top_k 50 \
      --log_completions true \
      --num_iterations 1 \
      --beta 0.01 \
      --output_dir ./ds-base-grpo \
      --report_to tensorboard \
      > logs/log-grpo-test.log 2>&1 &

Update template_meta.prefix bug

aacedar added 3 commits July 2, 2025 21:52

Update bos_token bug

1f76604

Merge pull request #1 from aacedar/aacedar-patch-2

d7e81cf

Update template_meta.prefix bug

Update grpo_trainer.py for special token encode error for deepseek model

7290409

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aacedar patch 3 #4832

Aacedar patch 3 #4832

Uh oh!

aacedar commented Jul 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

Aacedar patch 3 #4832

Are you sure you want to change the base?

Aacedar patch 3 #4832

Uh oh!

Conversation

aacedar commented Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Uh oh!

Uh oh!

aacedar commented Jul 4, 2025 •

edited

Loading