-
Notifications
You must be signed in to change notification settings - Fork 553
feat: Add unsloth Dockerfile and gptoss 20b finetuning notebook for strix halo #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @0xrushi, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly expands the platform's capabilities by introducing dedicated support for fine-tuning large language models on AMD Strix Halo hardware. It provides a complete environment setup via a new Dockerfile and a practical, step-by-step guide in a Jupyter notebook, enabling users to leverage Unsloth for efficient LLM training on this specific AMD architecture. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a Dockerfile and a Jupyter notebook to support fine-tuning the gpt-oss-20b model on AMD Strix Halo hardware. The additions are a solid foundation for this new capability. My review provides feedback to improve the Dockerfile's security, efficiency, and image size, as well as to enhance the notebook's correctness, maintainability, and reliability by addressing issues like incorrect cell dependencies and redundant code. I've included specific code suggestions to help implement these improvements.
| RUN apt-get update && apt-get install -y sudo && \ | ||
| useradd -m -s /bin/bash user && \ | ||
| usermod -aG sudo user && \ | ||
| echo "user ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers && \ | ||
| apt-get clean && rm -rf /var/lib/apt/lists/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granting the user passwordless sudo for all commands (NOPASSWD:ALL) poses a significant security risk, as it provides unrestricted root access without any authentication. While this may be convenient for a development environment, it is a dangerous practice. Consider running as the root user and only switching to a non-root user at the end of the Dockerfile, or grant more specific, limited sudo permissions if root access is required for certain operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should definitely prevent passwordless sudo for all commands.
| "source": [ | ||
| "## Dataset Preparation\n", | ||
| "\n", | ||
| "Loads a small quotes dataset, converts to chat format, and compiles Harmony-style text with the tokenizer's chat template." | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'Quick Model Smoke Test' cell is marked as optional, but the 'Dataset Preparation' cell that follows depends on the tokenizer object created in the smoke test. If a user skips the optional cell, the notebook will fail with a NameError. The notebook's cell execution flow should be corrected to handle this dependency, for instance by moving the model and tokenizer loading to before the dataset preparation cell or by making the smoke test mandatory.
| WORKDIR /opt/src | ||
|
|
||
| # bitsandbytes (ROCm) | ||
| RUN git clone -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce the Docker image size, it's a good practice to use git clone --depth 1 to perform a shallow clone. This avoids downloading the entire commit history for repositories that are only used for building dependencies. This recommendation applies to all git clone operations in this file (lines 30, 38, 43).
RUN git clone --depth 1 -b rocm_enabled_multi_backend https://github.com/ROCm/bitsandbytes.git
| RUN git clone https://github.com/ROCm/flash-attention.git | ||
| RUN cd flash-attention && git checkout v2.7.4-cktile && python setup.py install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To optimize image size and reduce the number of layers, it's best to combine cloning, building, and cleanup into a single RUN layer. This prevents the cloned source code from being persisted in an intermediate layer. Using git clone --depth 1 is also recommended.
RUN git clone --depth 1 https://github.com/ROCm/flash-attention.git && \
cd flash-attention && git checkout v2.7.4-cktile && python setup.py install && \
cd .. && rm -rf flash-attention
| RUN python -m pip install --no-cache-dir . | ||
| RUN python -m pip install --no-cache-dir jupyterlab ipywidgets ipykernel tqdm | ||
| RUN python -m pip install --no-cache-dir 'unsloth_zoo>=2025.5.7' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each RUN command creates a new layer in the Docker image. To optimize image size and reduce the number of layers, you should combine these consecutive pip install commands into a single RUN instruction.
RUN python -m pip install --no-cache-dir . jupyterlab ipywidgets ipykernel tqdm 'unsloth_zoo>=2025.5.7'
| "Option B — Local Docker build from this repo:\n", | ||
| "\n", | ||
| "```bash\n", | ||
| "docker build -f Dockerfile -t unsloth-strix-halo .\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docker build command refers to Dockerfile, but the Dockerfile added in this pull request is named Dockerfile_Strix_Halo. The filename in the command should be updated to match the new file to avoid errors for users following these instructions.
| "docker build -f Dockerfile -t unsloth-strix-halo .\n", | |
| "docker build -f Dockerfile_Strix_Halo -t unsloth-strix-halo .\n", |
| "dtype = None\n", | ||
| "\n", | ||
| "model, tokenizer = FastLanguageModel.from_pretrained(\n", | ||
| " model_name = \"unsloth/gpt-oss-20b\",\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model name is hardcoded here. It's better to use the MODEL_NAME variable defined in the 'Configuration and Hyperparameters' cell to ensure consistency and make it easier to change the model for different runs.
| " model_name = \"unsloth/gpt-oss-20b\",\n", | |
| " model_name = MODEL_NAME,\n", |
| "LR = 2e-4\n", | ||
| "EPOCHS = 1\n", | ||
| "BATCH_SIZE = 1 # keep small for safety\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hyperparameters LR, EPOCHS, and BATCH_SIZE are redefined here, but they have already been defined in the 'Configuration and Hyperparameters' cell. To improve maintainability and avoid potential inconsistencies, you should remove these redundant definitions and use the variables set in the configuration cell.
|
Oh thank you! We'll review this |
No description provided.