# Transfer learning with Huggingface using CodeFlare

In this notebook you will learn how to leverage the **[huggingface](https://huggingface.co/)** support in ray ecosystem to carry out a text classification task using transfer learning. We will be referencing the example **[here](https://huggingface.co/docs/transformers/tasks/sequence_classification)**

The example carries out a text classification task on **[imdb dataset](https://huggingface.co/datasets/imdb)** and tries to classify the movie reviews as positive or negative. Huggingface library provides an easy way to build a model and the dataset to carry out this classification task. In this case we will be using **distilbert-base-uncased** model which is a **BERT** based model.

Huggingface has a **[built in support for ray ecosystem](https://docs.ray.io/en/releases-1.13.0/_modules/ray/ml/train/integrations/huggingface/huggingface_trainer.html)** which allows the huggingface trainer to scale on CodeFlare and can scale the training as we add additional gpus and can run distributed training across multiple GPUs that will help scale out the training.


### Getting all the requirements in place

In [None]:
# Import pieces from codeflare-sdk
from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration
from codeflare_sdk.cluster.auth import TokenAuthentication

In [None]:
# Create authentication object for oc user permissions and login
auth = TokenAuthentication(
    token = "sha256~wTEk7b6J0jRiIGCCCl8f_uVRimPYqMjDjthEsQE5i9s",
    server = "https://api.mini2.mydomain.com:6443",
    skip_tls = True
)
auth.login()

Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper).

In [None]:
# Create our cluster and submit appwrapper
cluster = Cluster(ClusterConfiguration(name='hfgputest', min_worker=1, max_worker=3, min_cpus=8, max_cpus=8, min_memory=16, max_memory=16, gpu=1, instascale=False))

Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster.

In [None]:
cluster.up()

Now, we want to check on the initial status of our resource cluster, then wait until it is finally ready for use.

In [None]:
cluster.status()

In [None]:
cluster.wait_ready()

In [None]:
cluster.status()

Let's quickly verify that the specs of the cluster are as expected.

In [None]:
cluster.details()

In [None]:
ray_cluster_uri = cluster.cluster_uri()
print(ray_cluster_uri)

**NOTE**: Now we have our resource cluster with the desired GPUs, so we can interact with it to train the HuggingFace model.

In [None]:
#before proceeding make sure the cluster exists and the uri is not empty
assert ray_cluster_uri, "Ray cluster needs to be started and set before proceeding"

import ray
from ray.air.config import ScalingConfig

# reset the ray context in case there's already one. 
ray.shutdown()
# establish connection to ray cluster

#install additionall libraries that will be required for this training
runtime_env = {"pip": ["transformers", "datasets", "evaluate", "pyarrow<7.0.0"]}

ray.init(address=f'{ray_cluster_uri}', runtime_env=runtime_env)

print("Ray cluster is up and running: ", ray.is_initialized())

**NOTE** : in this case since we are running a task for which we need additional pip packages. we can install those by passing them in the `runtime_env` variable

### Transfer learning code from huggingface

We are using the code based on the example **[here](https://huggingface.co/docs/transformers/tasks/sequence_classification)** . 

In [None]:
@ray.remote
def train_fn():
    from datasets import load_dataset
    import transformers
    from transformers import AutoTokenizer, TrainingArguments
    from transformers import AutoModelForSequenceClassification
    import numpy as np
    from datasets import load_metric
    import ray
    from ray import tune
    from ray.train.huggingface import HuggingFaceTrainer

    dataset = load_dataset("imdb")
    tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)

    tokenized_datasets = dataset.map(tokenize_function, batched=True)

    #using a fraction of dataset but you can run with the full dataset
    small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(100))
    small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(100))

    print(f"len of train {small_train_dataset} and test {small_eval_dataset}")

    ray_train_ds = ray.data.from_huggingface(small_train_dataset)
    ray_evaluation_ds = ray.data.from_huggingface(small_eval_dataset)

    def compute_metrics(eval_pred):
        metric = load_metric("accuracy")
        logits, labels = eval_pred
        predictions = np.argmax(logits, axis=-1)
        return metric.compute(predictions=predictions, references=labels)

    def trainer_init_per_worker(train_dataset, eval_dataset, **config):
        model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=2)

        training_args = TrainingArguments("/tmp/hf_imdb/test", eval_steps=1, disable_tqdm=True, 
                                          num_train_epochs=1, skip_memory_metrics=True,
                                          learning_rate=2e-5,
                                          per_device_train_batch_size=16,
                                          per_device_eval_batch_size=16,                                
                                          weight_decay=0.01,)
        return transformers.Trainer(
            model=model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            compute_metrics=compute_metrics
        )

    scaling_config = ScalingConfig(num_workers=3, use_gpu=True) #num workers is the number of gpus

    # we are using the ray native HuggingFaceTrainer, but you can swap out to use non ray Huggingface Trainer. Both have the same method signature. 
    # the ray native HFTrainer has built in support for scaling to multiple GPUs
    trainer = HuggingFaceTrainer(
        trainer_init_per_worker=trainer_init_per_worker,
        scaling_config=scaling_config,
        datasets={"train": ray_train_ds, "evaluation": ray_evaluation_ds},
    )
    result = trainer.fit()
    print(f"metrics: {result.metrics}")
    print(f"checkpoint: {result.checkpoint}")
    print(f"log_dir: {result.log_dir}")
    return result.checkpoint
    #return result.log_dir

**NOTE:** This code will produce a lot of output and will run for **approximately 2 minutes.** As a part of execution it will download the `imdb` dataset, `distilbert-base-uncased` model and then will start transfer learning task for training the model with this dataset. 

In [None]:
#call the above cell as a remote ray function
result=ray.get(train_fn.remote())

In [None]:
from ray.train.torch import TorchCheckpoint
checkpoint: TorchCheckpoint = result
path = checkpoint.to_directory()

In [None]:
print(path)
!ls {path}

In [None]:
#log_dir=result.log_dir
#print(f"log_dir: {log_dir}")

# Inference using the checkpoint

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
#DistilbertTokenizerFast
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}
model = AutoModelForSequenceClassification.from_pretrained(path,num_labels=2, id2label=id2label, label2id=label2id)
text1 = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."
text2 = "This is a catastrophe. Each of the three movies had different actors that made it difficult to follow."
#inputs = tokenizer(text, return_tensors="pt")
batch=[text1,text2]
inputs = tokenizer(batch, padding=True, truncation=True, max_length=512, return_tensors="pt")

with torch.no_grad(): logits = model(**inputs).logits # For pytorch you have to unpack

In [None]:
print(logits)
print(torch.nn.Softmax(dim=1)(logits)) #tf.math.softmax(logits, axis=-1)

In [None]:
import numpy as np
print(np.array(logits))
predicted_class_id = np.array(logits).argmax(axis=1)
print(predicted_class_id)
print([model.config.id2label[i] for i in predicted_class_id])

# Convert to onyx

In [None]:
torch.onnx.export(
    model, 
    tuple(inputs.values()),
    f="torch-model.onnx",  
    input_names=['input_ids', 'attention_mask'], 
    output_names=['logits'], 
    dynamic_axes={'input_ids': {0: 'batch_size', 1: 'sequence'}, 
                  'attention_mask': {0: 'batch_size', 1: 'sequence'}, 
                  'logits': {0: 'batch_size', 1: 'sequence'}}, 
    do_constant_folding=True, 
    opset_version=13, 
)

In [None]:
from datasets import load_dataset
dataset = load_dataset("imdb")

In [None]:
import onnx
import onnxruntime
import torch
import numpy as np

In [None]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

In [None]:
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
print(tokenizer)

session = onnxruntime.InferenceSession('torch-model.onnx', None)
text="This is a catastrophe."
inputs = tokenizer(text, return_tensors="np")
print(inputs)

result1 = session.run([i.name for i in session.get_outputs()], dict(inputs))
print(result1)

id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}
predicted_class_id = np.array(result1).argmax().item()
print(id2label[predicted_class_id])

In [None]:
#import tensorflow as tf
#predictions = tf.math.softmax(result, axis=-1)
print(torch.nn.Softmax(dim=1)(torch.tensor(result1[0])))

In [None]:
text1 = "This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three."
text2 = "This is a catastrophe."
batch=[text1,text2]
inputs = tokenizer(batch, padding=True, truncation=True, max_length=512, return_tensors="np")
print(inputs)
result2 = session.run([i.name for i in session.get_outputs()], dict(inputs))
print(result2)
torch.nn.Softmax(dim=1)(torch.tensor(result2[0]))
print(np.argmax(torch.nn.Softmax(dim=1)(torch.tensor(result2[0])),axis=1))
print([id2label[i.item()] for i in torch.argmax(torch.nn.Softmax(dim=1)(torch.tensor(result2[0])),axis=1)])
labels=[id2label[labelid] for labelid in torch.argmax(torch.nn.Softmax(dim=1)(torch.tensor(result2[0])),axis=1).tolist()]
print(labels)

# Upload the model to S3 Bucket

In [None]:
import os
import boto3
from boto3 import session

key_id = os.environ.get('AWS_ACCESS_KEY_ID')
secret_key = os.environ.get('AWS_SECRET_ACCESS_KEY')
endpoint_url = os.environ.get('AWS_S3_ENDPOINT')
session = boto3.session.Session(aws_access_key_id=key_id, aws_secret_access_key=secret_key)
s3_client = boto3.client('s3', aws_access_key_id=key_id, aws_secret_access_key=secret_key,endpoint_url=endpoint_url,verify=False)
buckets=s3_client.list_buckets()
for bucket in buckets['Buckets']: print(bucket['Name'])

In [None]:
print(bucket['Name'])
modelfile='torch-model.onnx'
s3_client.upload_file(modelfile, bucket['Name'],'hf_model.onnx')

In [None]:
[item.get("Key") for item in s3_client.list_objects_v2(Bucket=bucket['Name']).get("Contents")]

Now manually deploy the model from Data Science Projects

---
# Submit inferencing request to Deployed model using HTTP

In [None]:
import requests
import json
URL='http://modelmesh-serving.huggingface.svc.cluster.local:8008/v2/models/hfmodel/infer' # underscore characters are removed
headers = {}
payload = {
        "inputs": [{ "name": "input_ids", "shape": inputs.get('input_ids').shape, "datatype": "INT64", "data": inputs.get('input_ids').tolist()},{ "name": "attention_mask", "shape": inputs.get('attention_mask').shape, "datatype": "INT64", "data": inputs.get('attention_mask').tolist()}]
    }
print(payload)
headers = {"content-type": "application/json"}
res = requests.post(URL, json=payload, headers=headers)
print(res)
print(res.text)

In [None]:
result=[np.array(res.json().get('outputs')[0].get('data')).reshape(res.json().get('outputs')[0].get('shape'))]

In [None]:
torch.nn.Softmax(dim=1)(torch.tensor(result[0]))
print(np.argmax(torch.nn.Softmax(dim=1)(torch.tensor(result[0])),axis=1))
print('Using item',[id2label[i.item()] for i in torch.argmax(torch.nn.Softmax(dim=1)(torch.tensor(result[0])),axis=1)])
labels=[id2label[labelid] for labelid in torch.argmax(torch.nn.Softmax(dim=1)(torch.tensor(result[0])),axis=1).tolist()]
print('Using to_list',labels)

# Submit inferencing request to Deployed model using GRPC

In [None]:
!pip install grpcio grpcio-tools==1.46.0

In [None]:
#!wget https://raw.githubusercontent.com/kserve/kserve/master/docs/predict-api/v2/grpc_predict_v2.proto
!wget https://raw.githubusercontent.com/kserve/modelmesh-serving/main/fvt/proto/kfs_inference_v2.proto
!python3 -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. ./kfs_inference_v2.proto

In [None]:
payload = { "model_name": "hfmodel",
        "inputs": [{ "name": "input_ids", "shape": inputs.get('input_ids').shape, "datatype": "INT64", 
                     "contents": {"int64_contents":[y for x in inputs.get('input_ids').tolist() for y in x]}},
                   { "name": "attention_mask", "shape": inputs.get('attention_mask').shape, "datatype": "INT64", 
                     "contents": {"int64_contents":[y for x in inputs.get('attention_mask').tolist() for y in x]}}]
    }
print(json.dumps(payload))

In [None]:
import grpc
import kfs_inference_v2_pb2, kfs_inference_v2_pb2_grpc
grpc_url="modelmesh-serving.huggingface.svc.cluster.local:8033"
request=kfs_inference_v2_pb2.ModelInferRequest(model_name="hfmodel",inputs=payload["inputs"])
grpc_channel = grpc.insecure_channel(grpc_url)
grpc_stub = kfs_inference_v2_pb2_grpc.GRPCInferenceServiceStub(grpc_channel)
response = grpc_stub.ModelInfer(request)

In [None]:
print(type(response.outputs),type(response.raw_output_contents))
from google.protobuf.json_format import MessageToDict
d = MessageToDict(response.outputs[0])
print(d)
binary_data=bytes([x for x in response.raw_output_contents[0]])

In [None]:
import struct
import base64
FLOAT = 'f'
fmt = '<' + FLOAT * (len(binary_data) // struct.calcsize(FLOAT))
numbers = struct.unpack(fmt, binary_data)
print(numbers)

In [None]:
np.array(numbers).reshape(*[int(i) for i in d.get("shape")])

Finally, we bring our resource cluster down and release/terminate the associated resources, bringing everything back to the way it was before our cluster was brought up.

# Conclusion
As shown in the above example, you can easily run your Huggingface transfer learning tasks easily and natively on CodeFlare. You can scale them from 1 to n GPUs without requiring you to make any significant code changes and leveraging the native Huggingface trainer. 

Also refer to additional notebooks that showcase other use cases
In our next notebook [./02_codeflare_workflows_encoding.ipynb ] shows an sklearn example and how you can leverage workflows to run experiment pipelines and explore multiple pipelines in parallel on CodeFlare cluster. 


In [None]:
cluster.down()

In [None]:
auth.logout()