Skip to content

Aacedar patch 3 #4832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Aacedar patch 3 #4832

wants to merge 3 commits into from

Conversation

aacedar
Copy link
Contributor

@aacedar aacedar commented Jul 4, 2025

PR type

使用deepseek-ai/deepseek-coder-6.7b-base模型做grpo, 发现template_meta_prefix error,#4808
问题表现同 #4785
当前错误代码错误在swift/llm/template/base.py的_swift_encode函数,涉及的代码行如下:


if self.template_meta.is_post_system or not system:

    prefix = template_meta.prefix

else:
    prefix = template_meta.system_prefix
self._concat_context_list(prefix, res_context_list, res_context_types, system=system)

PR information

debug发现prefix是 [[32013]], 对应的token是‘<|begin▁of▁sentence|>’,导致这个问题的原因在swift/llm/template/template_meta.py的init()函数的执行,将prefix这类值转成tokenid后替换该属性的值。
为了适配完整而正确的prompt,需要将token_id=32013,decode回‘<|begin▁of▁sentence|>’,这样行程的prompt为:‘<|begin▁of▁sentence|>User:xx\nAssitant:xx’

解法:为了尽可能的不影响grpo意外的功能,将修改base的地方改成修改grpo相关文件,具体见变更

测试脚本,just replace your_own_path in --model, --output_dir, you can test:

    CUDA_VISIBLE_DEVICES=7 \
    swift rollout \
      --model your_own_path/deepseek-ai/deepseek-coder-6.7b-base \ 
     > logs/log-grpo-rollout.log 2>&1 &

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 \
    NPROC_PER_NODE=6 \
    swift rlhf \
      --rlhf_type grpo \
      --model your_own_path/deepseek-ai/deepseek-coder-6.7b-base \
      --reward_funcs accuracy \
      --use_vllm true \
      --vllm_mode server \
      --vllm_server_host 127.0.0.1 \
      --vllm_server_port 8000 \
      --train_type full \
      --torch_dtype bfloat16 \
      --dataset AI-MO/NuminaMath-TIR#1000 \
      --split_dataset_ratio 0 \
      --max_completion_length 512 \
      --num_train_epochs 1 \
      --per_device_train_batch_size 2 \
      --learning_rate 1e-6 \
      --gradient_accumulation_steps 2 \
      --save_total_limit 2 \
      --logging_steps 1 \
      --deepspeed zero2 \
      --max_length 4096 \
      --warmup_ratio 0.05 \
      --dataloader_num_workers 2 \
      --dataset_num_proc 4 \
      --num_generations 6 \
      --temperature 0.9 \
      --top_p 0.9 \
      --top_k 50 \
      --log_completions true \
      --num_iterations 1 \
      --beta 0.01 \
      --output_dir ./ds-base-grpo \
      --report_to tensorboard \
      > logs/log-grpo-test.log 2>&1 &

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant