<a href="https://colab.research.google.com/github/technologyhamed/Persian-Speech-Recognition/blob/main/Transformers_Training_and_Inference_on_Remote_Hardware.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformers Training and Inference on Remote Hardware

This tutorial demonstrates how to run model training or inference on remote, self-hosted hardware, with automatic
dependency and environment setup, via 🏃‍♀️[Runhouse](https://github.com/run-house/runhouse)🏡. You can develop fully
locally on a Colab or local Python script, while executing on a remote cluster to take advantage of
accelerators like GPUs or TPUs.

If you have cloud credentials on AWS, GCP, Azure, or Lambda, this code demonstrates how you can automatically set up and spin up/down clusters on these clouds, and send code to be run remotely.

If you have SSH credentials that you use to log in to an existing cluster, you can run code remotely there as well.

## Install dependencies

In [None]:
!pip install runhouse

In [None]:
import runhouse as rh

If you already have a Runhouse account with your secrets saved, you can load them by calling login. Then confirm they're set up properly.

In [None]:
rh.login()

In [None]:
!sky check

## Setting up the Cluster

### On-Demand Cluster (AWS, Azure, GCP, or LambdaLabs)

For instructions on setting up cloud access for on-demand clusters, please refer to
[Hardware Setup](https://runhouse-docs.readthedocs-hosted.com/en/main/rh_primitives/cluster.html#hardware-setup).

In [None]:
# For GCP, Azure, or Lambda:
# gpu = rh.cluster(name="rh-a10x", instance_type="A100:1").up_if_not()

# For AWS or Lambda:
gpu = rh.cluster(name="rh-a10x", instance_type="A10:1").up_if_not()

# Set GPU to autostop after 60 min of inactivity (default is 30 min)
gpu.keep_warm(60)  # or -1 to keep up indefinitely

### On-Premise Cluster

For an on-prem cluster, you can instantaite it as follows, filling in the IP address, ssh user and private key path.

In [None]:
# For an existing cluster
# gpu = rh.cluster(ips=['<ip of the cluster>'],
#                  ssh_creds={'ssh_user': '...', 'ssh_private_key':'<path_to_key>'},
#                  name='rh-cluster')

## Loading and Inferencing a Transformer Model on Remote Hardware

Let's start by defining an inference function to run on the remote hardware. The function loads and initializes the gpt2 model and tokenizer, and then runs the inference on a sample input.

For functions defined in notebooks, we can't access global variables (including imported modules) from elsewhere in the notebook inside the function, so we need to move imports inside. This is because the function code is being written out to a module and imported on the cluster. You don't need to do this for functions defined in Python modules and can use them normally.

In [None]:
def run_gpt2(prompt, **kwargs):
    from transformers import AutoTokenizer, AutoModelForCausalLM

    tokenizer = AutoTokenizer.from_pretrained("gpt2")
    model = AutoModelForCausalLM.from_pretrained("gpt2").to("cuda")
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
    output = model.generate(input_ids, **kwargs).to("cpu")
    return tokenizer.decode(output[0], skip_special_tokens=True)

Next, define the dependencies necessary to run the inference function on our remote hardware. `"./"` means to sync over our working directory, whicgh contains the `run_gpt2` function.

**Note:** you may need to change the torch install version if you've changed the GPU hardware type. See the install matrix at [pytorch.org](https://pytorch.org/).

In [None]:
reqs = ['./', 'transformers', 'torch --upgrade --extra-index-url https://download.pytorch.org/whl/cu117']

Now, we can send our function (with requirements) `to` the remote box, which returns a function with the exact same signature as `run_gpt2`, but running remotely over RPC!

In [None]:
gpt2_gpu = rh.function(fn=run_gpt2).to(system=gpu, reqs=reqs)

INFO | 2023-03-24 17:14:49,031 | Writing out function function to /content/run_gpt2_fn.py as functions serialized in notebooks are brittle. Please make sure the function does not rely on any local variables, including imports (which should be moved inside the function body).
INFO | 2023-03-24 17:14:49,035 | Setting up Function on cluster.
INFO | 2023-03-24 17:14:49,039 | Copying local package content to cluster <rh-a10x>
INFO | 2023-03-24 17:14:50,626 | Installing packages on cluster rh-a10x: ['./', 'transformers', 'torch --upgrade --extra-index-url https://download.pytorch.org/whl/cu117']
INFO | 2023-03-24 17:14:50,814 | Function setup complete.


Now let's call it! The first time we run the model it is being loaded from the Hugging Face Hub, so it may take a minute depending on the model size. After that it's just you and the GPU!

In [None]:
res = gpt2_gpu("Hello, I'm a language model, and I", max_length=100, do_sample=True)
print(res)

INFO | 2023-03-24 17:15:18,130 | Running run_gpt2 via gRPC
INFO | 2023-03-24 17:15:22,296 | Time to send message: 4.16 seconds
Hello, I'm a language model, and I want to write code that speaks all languages.

I'm also a programmer and a developer, so my main job is trying to bring language features to the masses.

For a few years, I've been thinking through and implementing programming in Clojure, Java, Scala, etc.

I've used it in my projects for years, and I've spent a lot of time working with these languages, which have been very useful in the


## Training

We can even train a model on our remote GPU from inside this notebook! We'll follow the same general steps as the [Transformers training tutorial](https://huggingface.co/docs/transformers/training), using the Transformers PyTorch Trainer.

First, we need to download and preprocess our dataset from the hub, but there's no reason to do that in our notebook and take time sending it up to the cluster. Instead, let's send this function to the cluster so it downloads there directly. Note that we're adding Hugging Face Datasets as a requirement here.

In [None]:
def load_and_preprocess():
    from datasets import load_dataset

    dataset = load_dataset("yelp_review_full")
    dataset["train"][100]

    from transformers import AutoTokenizer
    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)

    tokenized_datasets = dataset.map(tokenize_function, batched=True)

    small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
    small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
    return [small_train_dataset, small_eval_dataset]

load_and_preprocess = rh.function(fn=load_and_preprocess).to(gpu, reqs=reqs+['datasets'])

INFO | 2023-03-24 18:22:25,097 | Writing out function function to /content/load_and_preprocess_fn.py as functions serialized in notebooks are brittle. Please make sure the function does not rely on any local variables, including imports (which should be moved inside the function body).
INFO | 2023-03-24 18:22:25,102 | Setting up Function on cluster.
INFO | 2023-03-24 18:22:25,106 | Copying local package content to cluster <rh-a10x>
INFO | 2023-03-24 18:22:26,693 | Installing packages on cluster rh-a10x: ['./', 'transformers', 'torch --upgrade --extra-index-url https://download.pytorch.org/whl/cu117', 'datasets']
INFO | 2023-03-24 18:22:31,104 | Function setup complete.


We also don't need to return the dataset back to our notebook VM. We'll call the remote function with `.remote(` so it runs async and returns a reference string (`run_key`) to the dataset stored on the cluster, rather than return the whole cluster.

In [None]:
datasets_ref = load_and_preprocess.remote()

INFO | 2023-03-24 18:22:31,115 | Running load_and_preprocess via gRPC
INFO | 2023-03-24 18:22:31,287 | Time to send message: 0.17 seconds
INFO | 2023-03-24 18:22:31,289 | Submitted remote call to cluster. Result or logs can be retrieved
 with run_key "load_and_preprocess_20230324_182231", e.g. 
`rh.cluster(name="/dongreenberg/rh-a10x").get("load_and_preprocess_20230324_182231", stream_logs=True)` in python 
`runhouse logs "rh-a10x" load_and_preprocess_20230324_182231` from the command line.
 or cancelled with 
`rh.cluster(name="/dongreenberg/rh-a10x").cancel("load_and_preprocess_20230324_182231")` in python or 
`runhouse cancel "rh-a10x" load_and_preprocess_20230324_182231` from the command line.


Now let's send over our training code. We need to pass in our datasets ref, which Runhouse will automatically replace with the dataset objects when passing them into the function on the cluster.

In [None]:
def train(hf_datasets):
    [small_train_dataset, small_eval_dataset] = hf_datasets

    from transformers import AutoModelForSequenceClassification

    model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

    import numpy as np
    import evaluate

    metric = evaluate.load("accuracy")  # Requires scikit-learn

    def compute_metrics(eval_pred):
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)

    from transformers import TrainingArguments, Trainer

    training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=small_train_dataset,
        eval_dataset=small_eval_dataset,
        compute_metrics=compute_metrics,
    )

    trainer.train()

train = rh.function(fn=train).to(gpu, reqs=reqs+['evaluate', 'scikit-learn'])

INFO | 2023-03-24 18:23:16,918 | Writing out function function to /content/train_fn.py as functions serialized in notebooks are brittle. Please make sure the function does not rely on any local variables, including imports (which should be moved inside the function body).
INFO | 2023-03-24 18:23:16,929 | Setting up Function on cluster.
INFO | 2023-03-24 18:23:16,936 | Copying local package content to cluster <rh-a10x>
INFO | 2023-03-24 18:23:18,466 | Installing packages on cluster rh-a10x: ['./', 'transformers', 'torch --upgrade --extra-index-url https://download.pytorch.org/whl/cu117', 'evaluate', 'scikit-learn']
INFO | 2023-03-24 18:23:23,959 | Function setup complete.


Now let's start our training. We'll call with `stream_logs=True` to view the progress as the training runs.

In [None]:
train(datasets_ref, stream_logs=True)

INFO | 2023-03-24 18:23:23,972 | Running train via gRPC
INFO | 2023-03-24 18:23:24,140 | Time to send message: 0.16 seconds
INFO | 2023-03-24 18:23:24,143 | Submitted remote call to cluster. Result or logs can be retrieved
 with run_key "train_20230324_182324", e.g. 
`rh.cluster(name="/dongreenberg/rh-a10x").get("train_20230324_182324", stream_logs=True)` in python 
`runhouse logs "rh-a10x" train_20230324_182324` from the command line.
 or cancelled with 
`rh.cluster(name="/dongreenberg/rh-a10x").cancel("train_20230324_182324")` in python or 
`runhouse cancel "rh-a10x" train_20230324_182324` from the command line.
:task_name:train
:task_name:train
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.

And there we go! A fine-tuned model ready to classify some restaurant reviews!


## More fun with Stable Diffusion, Flan-T5, ControlNet, and more!

While you have your cluster up, you can play with more models and examples from Transformers, Diffusers, Acclerate, and Spaces in the [Runhouse tutorials](https://github.com/run-house/tutorials) and [Funhouse](https://github.com/run-house/funhouse)!

Don't for get to call `gpu.keep_warm()` if you don't want your cluster to auto-terminate after inactivity.

## Terminate Cluster

Once you are done using the cluster, you can terminate it as follows:

In [None]:
gpu.teardown()

Terminating 1 cluster: rh-a10x. Proceed? [Y/n]: y
[2K[1;36mTerminating 1 cluster[0m [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m  0%[0m [36m-:--:--[0m
[1A[2K[32mTerminating cluster rh-a10x...done.[0m
[2K[1;36mTerminating 1 cluster[0m [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [35m100%[0m [36m0:00:00[0m
[?25h[0m