# Fine-tune TinyLlama-1.1B for text-to-SQL generation

## Introduction

In this workshop module, you will learn how to fine-tune a Llama-based LLM ([TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)) using causal language modelling so that the model learns how to generate SQL queries for text-based instructions. Your fine-tuning job will be launched using SageMaker Training which provides a serverless training environment where you do not need to manage the underlying infrastructure. You will learn how to configure a PyTorch training job using [SageMaker's PyTorch estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html), and how to leverage the [Hugging Face Optimum Neuron](https://github.com/huggingface/optimum-neuron) package to easily run the PyTorch training job with AWS Trainium accelerators via an [AWS EC2 trn1.2xlarge instance](https://aws.amazon.com/ec2/instance-types/trn1/).

For this module, you will be using the [b-mc2/sql-create-context](https://huggingface.co/datasets/b-mc2/sql-create-context) dataset which consists of thousands of examples of SQL schemas, questions about the schemas, and SQL queries intended to answer the questions.

*Dataset example 1:*
* *SQL schema/context:* `CREATE TABLE management (department_id VARCHAR); CREATE TABLE department (department_id VARCHAR)`
* *Question:* `How many departments are led by heads who are not mentioned?`
* *SQL query/answer:* `SELECT COUNT(*) FROM department WHERE NOT department_id IN (SELECT department_id FROM management)`

*Dataset example 2:*
* *SQL schema/context:* `CREATE TABLE courses (course_name VARCHAR, course_id VARCHAR); CREATE TABLE student_course_registrations (student_id VARCHAR, course_id VARCHAR)`
* *Question:* `What are the ids of all students for courses and what are the names of those courses?`
* *SQL query/answer:* `SELECT T1.student_id, T2.course_name FROM student_course_registrations AS T1 JOIN courses AS T2 ON T1.course_id = T2.course_id`

By fine-tuning the model over several thousand of these text-to-SQL examples, the model will then learn how to generate an appropriate SQL query when presented with a SQL context and a free-form question.

This text-to-SQL use case was selected so you can successfully fine-tune your model in a reasonably short amount of time (~20 minutes) which is appropriate for this 1hr workshop. Although this is a relatively simple use case, please keep in mind that the same techniques and components used in this module can also be applied to fine-tune LLMs for more advanced use cases such as writing code, summarizing documents, creating blog posts - the possibilities are endless!

## Prerequisites

This notebook uses the SageMaker Python SDK to prepare, launch, and monitor the progress of a PyTorch-based training job. Before we get started, it is important to upgrade the SageMaker SDK to ensure that you are using the latest version. Run the next two cells to upgrade the SageMaker SDK and set up your session.

In [None]:
# Upgrade SageMaker SDK to the latest version
%pip install -U sagemaker awscli -q 2>&1 | grep -v "warnings/venv"

In [None]:
import logging
sagemaker_config_logger = logging.getLogger("sagemaker.config")
sagemaker_config_logger.setLevel(logging.WARNING)

# Import SageMaker SDK, setup our session
from sagemaker import get_execution_role, Session
from sagemaker.pytorch import PyTorch

sess = Session()
default_bucket = sess.default_bucket()

## Specify the Optimum Neuron deep learning container (DLC) image

The SageMaker Training service uses containers to execute your training script, allowing you to fully customize your training script environment and any required dependencies. For this workshop, you will use a recent Optimum Neuron deep learning container (DLC) image which is an AWS-maintained image containing the Neuron SDK, PyTorch, and the Hugging Face Optimum Neuron library.

In [None]:
# Specify the Neuron DLC that we will use for training
#   For now, we'll use the standard Neuron DLC and install Optimum Neuron v0.0.25 at training time. Let's replace this with new Optimum Neuron DLC once it's available.

training_image = f"763104351884.dkr.ecr.{sess.boto_region_name}.amazonaws.com/pytorch-training-neuronx:2.1.2-neuronx-py310-sdk2.20.1-ubuntu20.04"
print(training_image)

## Configure the PyTorch Estimator

The SageMaker SDK includes a [PyTorch Estimator](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html) class which you can use to define a PyTorch training job that will be executed in the SageMaker managed environment. 

In the following cell, you will create a PyTorch Estimator which will run the attached `finetune_llama.py` training script on a trn1.2xlarge instance. The `finetune_llama.py` script is an Optimum Neuron training script that can be used for causal language modelling with AWS Trainium.

The PyTorch Estimator has many parameters that can be used to configure your training job. A few of the most important parameters include:

- *entry_point*: refers to the name of the training script that will be executed as part of this training job
- *source_dir*: the path to the local source code directory (relative to your notebook) that will be packaged up and included inside your training container
- *instance_count*: defines how many EC2 instances to use for this training job
- *instance_type*: determines which type of EC2 instance will be used for training
- *image_uri*: defines which training DLC will be used to run the training job (see Neuron DLC, above)
- *distribution*: determines which type of distribution to use for the training job - you will need 'torch_distributed' for this workshop
- *environment*: provides a dictionary of environment variables which will be applied to your training environment
- *hyperparameters*: provides a dictionary of command-line arguments to pass to your training script, ex: finetune_llama.py

In the `hyperparameters` section, you can see the specific command-line arguments that are used to control the behavior of the `finetune_llama.py` training script. Notably:
- *model_id*: specifies which model you will be fine-tuning, in this case a recent checkpoint from the TinyLlama-1.1B project
- *tokenizer_id*: specifies which tokenizer you will used to tokenize the dataset examples during training
- *output_dir*: directory in which the fine-tuned model will be saved. Here we use the SageMaker-specific `/opt/ml/model` directory. At the end of the training job, SageMaker automatically copies the contents of this directory to S3 on your behalf
- *tensor_parallel_size*: the tensor parallel degree for which we want to use for training. In this case we use '2' to shard the model across the 2 NeuronCores available in the trn1.2xlarge instance
- *bf16*: request BFloat16 training
- *per_device_train_batch_size*: the microbatch size to be used for fine-tuning
- *gradient_accumulation_steps*: how many steps for which gradients will be accumulated between updates
- *max_steps*: the maximum number of steps of fine-tuning that we want to perform
- *lora_r*, *lora_alpha*, *lora_dropout*: the LoRA rank, alpha, and dropout values to use during fine-tuning

The below estimator has been pre-configured for you, so you do not need to make any changes.

In [None]:
# Set up the PyTorch estimator
# Note that the hyperparameters are just command-line args passed to the finetune_llama.py script to control its behavior

pt_estimator = PyTorch(
        entry_point="finetune_llama.py",
        source_dir="./assets",
        role=get_execution_role(),
        instance_count=1,
        instance_type="ml.trn1.2xlarge",
        disable_profiler=True,
        output_path=f"s3://{default_bucket}/neuron_events2024",
        base_job_name="trn1-tinyllama",
        sagemaker_session=sess,
        code_bucket=f"s3://{default_bucket}/neuron_events2024_code",
        checkpoint_s3_uri=f"s3://{default_bucket}/neuron_events_output",
        image_uri=training_image,
        distribution={"torch_distributed": {"enabled": True}},
        environment={"FI_EFA_FORK_SAFE": "1", "WANDB_DISABLED": "true"},
        disable_output_compression=True,
        hyperparameters={
            "model_id": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
            "tokenizer_id": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
            "output_dir": "/opt/ml/model",
            "tensor_parallel_size": 2,
            "bf16": True,
            "per_device_train_batch_size": 2,
            "gradient_accumulation_steps": 1,
            "gradient_checkpointing": True,
            "max_steps": 1000,
            "lora_r": 16,
            "lora_alpha": 32,
            "lora_dropout": 0.05,
            "logging_steps": 10,
            "learning_rate": 5e-5,
            "dataloader_drop_last": True,
            "disable_tqdm": True
        }
    )

## Launch the training job

Once the estimator has been created, you can then launch your training job by calling `.fit()` on the estimator:

In [None]:
# Call fit() on the estimator to initiate the training job
pt_estimator.fit(wait=False, logs=False)

## Monitor the training job

When the training job has been launched, the SageMaker Training service will then take care of:
- launching and configuring the requested EC2 infrastructure for your training job
- launching the requested container image on each of the EC2 instances
- copying your source code directory and running your training script within the container(s)
- storing your trained model artifacts in Amazon Simple Storage Service (S3)
- decommissioning the training infrastructure

While the training job is running, the following cell will periodically check and output the job status. When you see 'Completed', you know that your training job is finished and you can proceed to the remainder of the notebook. The training job typically takes about 20 minutes to complete.

If you are interested in viewing the output logs from your training job, you can view the logs by navigating to the AWS CloudWatch console, selecting `Logs -> Log Groups` in the left-hand menu, and then looking for your SageMaker training job in the list. **Note:** it will usually take 4-5 minutes before the infrastructure is running and the output logs begin to be populated in CloudWatch.

In [None]:
# Periodically check job status until it shows 'Completed' (ETA ~20 minutes)
#  You can also monitor job status in the SageMaker console, and view the
#  SageMaker Training job logs in the CloudWatch console
from time import sleep
from datetime import datetime

while (job_status := pt_estimator.jobs[-1].describe()['TrainingJobStatus']) not in ['Completed', 'Error']:
    print(f"{datetime.now().isoformat()} Training job status: {job_status}!")
    sleep(30)

print(f"\n{datetime.now().isoformat()} Training job status: {job_status}!")

## Determine location of fine-tuned model artifacts

Once the training job has completed, SageMaker will copy your fine-tuned model artifacts to a specified location in S3.

In the following cell, you can see how to programmatically determine the location of your model artifacts:

In [None]:
# Show where the fine-tuned model is stored - previous job must be 'Completed' before running this cell
model_archive_path = pt_estimator.jobs[-1].describe()['ModelArtifacts']['S3ModelArtifacts']
print(f"Your fine-tuned model is available here:\n\n{model_archive_path}/merged_model/")

<br/>

**Note:** Please copy the above S3 path, as it will be required in the subsequent workshop module.


Lastly, run the following cell to list the model artifacts available in your S3 model_archive_path:

In [None]:
# View the contents of the fine-tuned model path in S3
!aws s3 ls {model_archive_path}/merged_model/

Congratulations on completing the LLM fine-tuning module!

In the next notebook, you will learn how to deploy your fine-tuned model in a SageMaker hosted endpoint, and leverage AWS Inferentia accelerators to perform model inference. Have fun!