Skip to content
forked from InternLM/xtuner

A toolkit for efficiently fine-tuning LLM (InternLM, Llama, Baichuan, QWen, ChatGLM)

License

Notifications You must be signed in to change notification settings

xiaohangguo/xtuner

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation



GitHub Repo stars license PyPI Downloads issue resolution open issues

๐Ÿ‘‹ join us on Static Badge Static Badge Static Badge

๐Ÿ” Explore our models on Static Badge Static Badge

English | ็ฎ€ไฝ“ไธญๆ–‡

๐ŸŽ‰ News

  • [2023/12] ๐Ÿ”ฅ Support multi-modal VLM pretraining and fine-tuning with LLaVA-v1.5 architecture! Click here for details!
  • [2023/12] ๐Ÿ”ฅ Support Mixtral 8x7b model! Click here for details!
  • [2023/11] Support ChatGLM3-6B model!
  • [2023/10] Support MSAgent-Bench dataset, and the fine-tuned LLMs can be applied by Lagent!
  • [2023/10] Optimize the data processing to accommodate system context. More information can be found on Docs!
  • [2023/09] Support InternLM-20B models!
  • [2023/09] Support Baichuan2 models!
  • [2023/08] XTuner is released, with multiple fine-tuned adapters on HuggingFace.

๐Ÿ“– Introduction

XTuner is a toolkit for efficiently fine-tuning LLM, developed by the MMRazor and MMDeploy teams.

  • Efficiency: Support LLM fine-tuning on consumer-grade GPUs. The minimum GPU memory required for 7B LLM fine-tuning is only 8GB, indicating that users can use nearly any GPU (even the free resource, e.g., Colab) to fine-tune custom LLMs.
  • Versatile: Support various LLMs (InternLM, Llama2, ChatGLM, Qwen, Baichuan2, ...), datasets (MOSS_003_SFT, Alpaca, WizardLM, oasst1, Open-Platypus, Code Alpaca, Colorist, ...) and algorithms (QLoRA, LoRA), allowing users to choose the most suitable solution for their requirements.
  • Compatibility: Compatible with DeepSpeed ๐Ÿš€ and HuggingFace ๐Ÿค— training pipeline, enabling effortless integration and utilization.

๐ŸŒŸ Demos

  • Ready-to-use models and datasets from XTuner API Open In Colab

  • QLoRA Fine-tune Open In Colab

  • Plugin-based Chat Open In Colab

    Examples of Plugin-based Chat ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

๐Ÿ”ฅ Supports

Models SFT Datasets Data Pipelines Algorithms

๐Ÿ› ๏ธ Quick Start

Installation

  • It is recommended to build a Python-3.10 virtual environment using conda

    conda create --name xtuner-env python=3.10 -y
    conda activate xtuner-env
  • Install XTuner via pip

    pip install -U xtuner

    or with DeepSpeed integration

    pip install -U 'xtuner[deepspeed]'
  • Install XTuner from source

    git clone https://github.com/InternLM/xtuner.git
    cd xtuner
    pip install -e '.[all]'

Fine-tune Open In Colab

XTuner supports the efficient fine-tune (e.g., QLoRA) for LLMs. Dataset prepare guides can be found on dataset_prepare.md.

  • Step 0, prepare the config. XTuner provides many ready-to-use configs and we can view all configs by

    xtuner list-cfg

    Or, if the provided configs cannot meet the requirements, please copy the provided config to the specified directory and make specific modifications by

    xtuner copy-cfg ${CONFIG_NAME} ${SAVE_PATH}
  • Step 1, start fine-tuning.

    xtuner train ${CONFIG_NAME_OR_PATH}

    For example, we can start the QLoRA fine-tuning of InternLM-7B with oasst1 dataset by

    # On a single GPU
    xtuner train internlm_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
    # On multiple GPUs
    (DIST) NPROC_PER_NODE=${GPU_NUM} xtuner train internlm_7b_qlora_oasst1_e3 --deepspeed deepspeed_zero2
    (SLURM) srun ${SRUN_ARGS} xtuner train internlm_7b_qlora_oasst1_e3 --launcher slurm --deepspeed deepspeed_zero2
    • --deepspeed means using DeepSpeed ๐Ÿš€ to optimize the training. XTuner comes with several integrated strategies including ZeRO-1, ZeRO-2, and ZeRO-3. If you wish to disable this feature, simply remove this argument.

    • For more examples, please see finetune.md.

  • Step 2, convert the saved PTH model (if using DeepSpeed, it will be a directory) to HuggingFace model, by

    xtuner convert pth_to_hf ${CONFIG_NAME_OR_PATH} ${PTH} ${SAVE_PATH}

Chat Open In Colab

XTuner provides tools to chat with pretrained / fine-tuned LLMs.

xtuner chat ${NAME_OR_PATH_TO_LLM} --adapter {NAME_OR_PATH_TO_ADAPTER} [optional arguments]

For example, we can start the chat with

InternLM-7B with adapter trained from Alpaca-enzh:

xtuner chat internlm/internlm-7b --adapter xtuner/internlm-7b-qlora-alpaca-enzh --prompt-template internlm_chat --system-template alpaca

Llama2-7b with adapter trained from MOSS-003-SFT:

xtuner chat meta-llama/Llama-2-7b-hf --adapter xtuner/Llama-2-7b-qlora-moss-003-sft --bot-name Llama2 --prompt-template moss_sft --system-template moss_sft --with-plugins calculate solve search --command-stop-word "<eoc>" --answer-stop-word "<eom>" --no-streamer

For more examples, please see chat.md.

Deployment

  • Step 0, merge the HuggingFace adapter to pretrained LLM, by

    xtuner convert merge \
        ${NAME_OR_PATH_TO_LLM} \
        ${NAME_OR_PATH_TO_ADAPTER} \
        ${SAVE_PATH} \
        --max-shard-size 2GB
  • Step 1, deploy fine-tuned LLM with any other framework, such as LMDeploy ๐Ÿš€.

    pip install lmdeploy
    python -m lmdeploy.pytorch.chat ${NAME_OR_PATH_TO_LLM} \
        --max_new_tokens 256 \
        --temperture 0.8 \
        --top_p 0.95 \
        --seed 0

    ๐Ÿ”ฅ Seeking efficient inference with less GPU memory? Try 4-bit quantization from LMDeploy! For more details, see here.

Evaluation

  • We recommend using OpenCompass, a comprehensive and systematic LLM evaluation library, which currently supports 50+ datasets with about 300,000 questions.

๐Ÿค Contributing

We appreciate all contributions to XTuner. Please refer to CONTRIBUTING.md for the contributing guideline.

๐ŸŽ–๏ธ Acknowledgement

๐Ÿ–Š๏ธ Citation

@misc{2023xtuner,
    title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
    author={XTuner Contributors},
    howpublished = {\url{https://github.com/InternLM/xtuner}},
    year={2023}
}

License

This project is released under the Apache License 2.0. Please also adhere to the Licenses of models and datasets being used.

About

A toolkit for efficiently fine-tuning LLM (InternLM, Llama, Baichuan, QWen, ChatGLM)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%