Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

实验 MFTCoder 的效果总是不尽人意 #64

Open
Chaochao2020 opened this issue Jun 13, 2024 · 2 comments
Open

实验 MFTCoder 的效果总是不尽人意 #64

Chaochao2020 opened this issue Jun 13, 2024 · 2 comments

Comments

@Chaochao2020
Copy link

Chaochao2020 commented Jun 13, 2024

你好,我使用了三个开源问答数据集,分别为 Squad、Commonsense、ai2_arc 来进行多任务微调,但我迄今为止微调三次试验结果都不如单次微调甚至不如混合微调,我使用 mistral_v0.2 模型对三个数据集分别进行微调、混合微调、不微调、MFTcoder 微调进行比较,下面是我的实验结果:
image

所以我在怀疑是我微调参数是否出了问题,或者说这三个问答子任务相关性并不强所以导致这种现象。所以在此提出 issue 来向你们请教。
此外你的配置选项中有一些参数并没有说明,例如 data weight 等,我觉得对于框架使用者来说还是完善一些更好。

下面是我的第三次 mftcoder 微调配置:

{
    "data_paths": "[data/ai2_arc,data/squad,data/commonsense_qa]",
    "output_dir": "model/mistral/single_train_2024-06-13/lora_adaptor",
    "tb_dir": "model/mistral/single_train_2024-06-13/tensorboard",
    "pretrained_model_path": "AI-ModelScope/Mistral-7B-Instruct-v0___2",
    "model_type": "mistral",
    "load_raw_dataset": true,
    "data_split": "98,2,0",
    "padding_mode": "padding",
    "use_dynamic_padding": true,
    "tokenize_mode": "sft",
    "tokenizer_type": "AutoTokenizer",
    "weighted_loss_mode": "case3",
    "attn_implementation": "flash_attention_2",
    "seq_length": 1024,
    "seed": 1234,
    "peft_type": "lora",
    "quantization": null,
    "lora_rank": 64,
    "lora_alpha": 128,
    "lora_dropout": 0.05,
    "per_device_train_batch_size": 16,
    "per_device_eval_batch_size": 32,
    "learning_rate": 5e-5,
    "min_lr": 5e-6,
    "weight_decay": 0.1,
    "gradient_accumulation_steps": 2,
    "lr_scheduler_type": "cosine",
    "num_warmup_steps": 300,
    "num_train_epochs": 2,
    "log_interval": 10,
    "checkpointing_steps": 100,
    "evaluation_steps": 100,
    "max_train_steps": null,
    "epoch_checkpointing": true,
    "shuffle_before_split": true,
    "early_stopping": true,
    "early_stopping_stall_num": 5,
    "saving_limit": null,
    "target_modules":["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "down_proj", "up_proj"]
}
@lxFang2022
Copy link

你好,我使用了三个开源问答数据集,分别为 Squad、Commonsense、ai2_arc 来进行多任务微调,但我迄今为止微调三次试验结果都不如单次微调甚至不如混合微调,我使用 mistral_v0.2 模型对三个数据集分别进行微调、混合微调、不微调、MFTcoder 微调进行比较,下面是我的实验结果: image

所以我在怀疑是我微调参数是否出了问题,或者说这三个问答子任务相关性并不强所以导致这种现象。所以在此提出 issue 来向你们请教。 此外你的配置选项中有一些参数并没有说明,例如 data weight 等,我觉得对于框架使用者来说还是完善一些更好。

下面是我的第三次 mftcoder 微调配置:

{
    "data_paths": "[data/ai2_arc,data/squad,data/commonsense_qa]",
    "output_dir": "model/mistral/single_train_2024-06-13/lora_adaptor",
    "tb_dir": "model/mistral/single_train_2024-06-13/tensorboard",
    "pretrained_model_path": "AI-ModelScope/Mistral-7B-Instruct-v0___2",
    "model_type": "mistral",
    "load_raw_dataset": true,
    "data_split": "98,2,0",
    "padding_mode": "padding",
    "use_dynamic_padding": true,
    "tokenize_mode": "sft",
    "tokenizer_type": "AutoTokenizer",
    "weighted_loss_mode": "case3",
    "attn_implementation": "flash_attention_2",
    "seq_length": 1024,
    "seed": 1234,
    "peft_type": "lora",
    "quantization": null,
    "lora_rank": 64,
    "lora_alpha": 128,
    "lora_dropout": 0.05,
    "per_device_train_batch_size": 16,
    "per_device_eval_batch_size": 32,
    "learning_rate": 5e-5,
    "min_lr": 5e-6,
    "weight_decay": 0.1,
    "gradient_accumulation_steps": 2,
    "lr_scheduler_type": "cosine",
    "num_warmup_steps": 300,
    "num_train_epochs": 2,
    "log_interval": 10,
    "checkpointing_steps": 100,
    "evaluation_steps": 100,
    "max_train_steps": null,
    "epoch_checkpointing": true,
    "shuffle_before_split": true,
    "early_stopping": true,
    "early_stopping_stall_num": 5,
    "saving_limit": null,
    "target_modules":["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "down_proj", "up_proj"]
}

您好 我也想用mistral_v0.2 模型进行微调,但是遇到了一点问题,可以加个联系方式交流一下吗?我的邮箱是791785837@qq.com

@XiaoMaGe-hero
Copy link

我使用自己场景的数据集(每个任务数据量 1k ~ 20k)微调,四个任务的收敛速度、收敛效果也是更差,或许需要使用更多epoch 训练?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants