Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add memory-efficient merging script #608

Merged
merged 10 commits into from
Jun 16, 2023
Merged

Add memory-efficient merging script #608

merged 10 commits into from
Jun 16, 2023

Conversation

airaria
Copy link
Contributor

@airaria airaria commented Jun 16, 2023

Description

This PR adds a memory-efficient merging script (merge_llama_with_chinese_lora_low_mem.py, and related notebooks).
Comparing to the original script, the new script significantly reduces the memory usage for merging LoRAs to the LLaMA model.

Estimated max memory usage when merging alpaca-lora to the LLaMA with the new script:

Model Max Memory Usage
7B 15GB
13B 18GB
33B 22GB

@ymcui ymcui requested a review from iMountTai June 16, 2023 01:27
报错信息容易引起混淆,可删除
@ymcui
Copy link
Owner

ymcui commented Jun 16, 2023

GPT-4 with Code Interpreter made some suggestions.
Please make any changes that seem valid and appropriate.


Here are some suggestions on how to improve the code:

  1. Avoid Hardcoding Values: The model parameters are hardcoded in params_of_models which may make the script less flexible. Consider loading these parameters from a configuration file or allowing the user to input them as arguments.
  2. Modularize the Code: The code could be more modular. It's generally a good idea to separate logic into functions, which makes the code easier to read, understand, test, and maintain. For instance, you could separate the loading of models and tokenizers into its own function.
  3. Error Handling: While the script does contain some error handling logic, more comprehensive error handling could improve robustness. For example, what if the base model doesn't exist even after trying to download it? Or what if the LoRA models cannot be loaded for some reason?
  4. Documentation: Consider adding more comments to explain what different parts of the code are doing, especially for complex operations. This is especially important in the save_shards and merge_shards functions, which perform non-trivial manipulations on the model weights.
  5. Naming Conventions: Some variable names are not clear or descriptive, such as tandl and ti. Choose descriptive variable names to improve the readability of your code.
  6. Code Duplication: There are several instances where you perform the same operations multiple times, which could be avoided. For instance, the split operation in save_shards function is called multiple times with different parameters. You could create a helper function for this operation to avoid code duplication.
  7. Use Context Managers for File Operations: When handling file operations, it's a good practice to use context managers (with statement) to ensure resources are properly managed. For example, in the last part of your code where you are loading and saving JSON files, you should use the with statement to open the file.

Overall, the code seems to be doing a complex task (merging LoRA models with a base model) and appears to be doing it correctly, but these improvements could make the code cleaner, more efficient, and easier to maintain.

@ymcui ymcui requested a review from iMountTai June 16, 2023 02:43
@ymcui ymcui merged commit 0c7c092 into main Jun 16, 2023
@ymcui ymcui deleted the low_mem_merge branch June 20, 2023 05:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants