# Background

We have a dataset that has project information (id, title, description, etc.) and corresponding labels. Use only the information provided in the dataset to train a model that can use a project's title and description as inputs and predict the corresponding label.

There are 4 possible labels: `["computer-vision", "natural-language-processing", "mlops", "other"]`

- **dataset for training**: https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/madewithml/dataset.csv
- **dataset for evaluation**: https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/madewithml/holdout.csv

> Use only the training dataset for training (this include splitting into training + validation data splits). The evaluation dataset is only to be used as holdout dataset for evaluation *after* training.

Each bolded section in the notebook is a distinct CUJ with a few details about it. We'll also include hidden recommendations relevant to the CUJ that you can expand to view. The goal is to complete the CUJs by using [Ray documentation](https://docs.ray.io/en/latest/), our Slack channels (#train-cuj) for support or view parts of the (one possible) solution [here](https://github.com/anyscale/Made-With-ML/blob/main/notebooks/madewithml.ipynb), etc. You may find that some CUJs are very open-ended. Feel free to approach the task however you wish and when we convene for the CUJ, we can compare our approaches.

# 🛠️&nbsp; Set up dev environment

We already did this through the instructions in the README. It's up to you whether you want to run this locally or on Anyscale Workspaces (highly recommended).

Goals by the end of this section:
- All dependencies installed on the cluster.
- Do any setup needed for the experiment tracking tool of your choice. Use [MLflow](https://docs.ray.io/en/latest/tune/examples/tune-mlflow.html) to save model artifacts locally or use [Weights and Biases](https://wandb.com/) for a managed solution.


> When you pip install a package, be sure to do it directly in the notebook (`pip install LIBRARY[VERSION] -q`) so we can see what packages you're using. Alternatively, you can store the information in the `requirements.txt` file. You will want to do `pip install --user LIBRARY[VERSION] -q` when working inside Anyscale Workspaces so that all the worker nodes can also have access to the version of the library.

In [1]:
# Code here


# 🔢&nbsp; Data ingestion and preprocessing

Goals by the end of this section:
- Load the training and test datasets. Split the training dataset into training/validation subsets.
- Preprocess the dataset so that it's ready to be ingested by your training loop.

In [2]:
# Code here
TRAIN_DATA_URL = "https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/madewithml/dataset.csv"
TEST_DATA_URL = "https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/madewithml/holdout.csv"

from datasets import load_dataset
hf_ds = load_dataset("csv", data_files={"train": TRAIN_DATA_URL, "test": TEST_DATA_URL}, keep_in_memory=True)

Using custom data configuration default-e255c49ad510847c
Reusing dataset csv (/home/ray/.cache/huggingface/datasets/csv/default-e255c49ad510847c/0.0.0/433e0ccc46f9880962cc2b12065189766fbb2bee57a221866138fb9203c83519)


  0%|          | 0/2 [00:00<?, ?it/s]

In [3]:
print(hf_ds)
print(hf_ds["train"][:3])

DatasetDict({
    train: Dataset({
        features: ['id', 'created_on', 'title', 'description', 'tag'],
        num_rows: 764
    })
    test: Dataset({
        features: ['id', 'created_on', 'title', 'description', 'tag'],
        num_rows: 191
    })
})
{'id': [6, 7, 9], 'created_on': ['2020-02-20 06:43:18', '2020-02-20 06:47:21', '2020-02-24 16:24:45'], 'title': ['Comparison between YOLO and RCNN on real world videos', 'Show, Infer & Tell: Contextual Inference for Creative Captioning', 'Awesome Graph Classification'], 'description': ['Bringing theory to experiment is cool. We can easily train models in colab and find the results in minutes.', 'The beauty of the work lies in the way it architects the fundamental idea that humans look at the overall image and then individual pieces of it.\r\n', 'A collection of important graph embedding, classification and representation learning papers with implementations.'], 'tag': ['computer-vision', 'computer-vision', 'other']}


In [4]:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-large-cased")

ALL_TAGS = ["computer-vision", "natural-language-processing", "mlops", "other"]

def preprocess(example):
    result = {}
    result["input_text"] = example["title"] + tokenizer.sep_token + example["description"]
    result["label"] = ALL_TAGS.index(example["tag"])
    return result

def tokenize(examples):
    return tokenizer(examples["input_text"])

processed_ds = hf_ds.map(preprocess, batched=False).map(tokenize, batched=True)



  0%|          | 0/764 [00:00<?, ?ex/s]

  0%|          | 0/191 [00:00<?, ?ex/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

  0%|          | 0/1 [00:00<?, ?ba/s]

##### Recommendations

- The type of preprocessing you do really depends on the model you're trying to train. But in general, you'll want to represent all input features and labels as numerical values.
- Note that for our task, only use the [training dataset](https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/madewithml/dataset.csv) to split into training and validation data splits, while the test (holdout) data split will be the entire [evaluation dataset](https://raw.githubusercontent.com/GokuMohandas/Made-With-ML/main/datasets/madewithml/holdout.csv)
- Our dataset is quite small so it should comfortably fit as one block on single worker.

# 🤖&nbsp; Model definition and configuration

Goals by the end of this section:

- Define your model and training loop.

In [5]:
from transformers.integrations import WandbCallback

comet_ml is installed but `COMET_API_KEY` is not set.
2023-07-17 13:05:53.675827: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-07-17 13:05:53.830050: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-07-17 13:05:54.619131: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/

In [6]:
import transformers
from transformers import TrainerCallback
from ray.train.huggingface.transformers import TransformersCheckpoint
from ray.air import session
from pathlib import Path

class AIRReportCallback(TrainerCallback):
    def __init__(self):
        self.delayed_report = {"metrics": {}, "checkpoint": None}
        super().__init__()
    
    def on_log(self, args, state, control, model=None, logs=None, **kwargs):
        report = {**logs, "step": state.global_step, "epoch": state.epoch}
        self.delayed_report["metrics"].update(report)

    def on_save(self, args, state, control, **kwargs):
        # Save is called after evaluation.
        checkpoint_path = Path(
            transformers.trainer.get_last_checkpoint(args.output_dir)
        ).absolute()
        if checkpoint_path:
            self.delayed_report["checkpoint"] = TransformersCheckpoint.from_directory(
                str(checkpoint_path)
            )
        session.report(**self.delayed_report)
        self.delayed_report = {"metrics": {}, "checkpoint": None}


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


  from pandas import MultiIndex, Int64Index


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [7]:
# Code here
import evaluate
import numpy as np
from transformers import AutoModelForSequenceClassification, DataCollatorWithPadding, TrainingArguments, Trainer

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

id2label = {id:label for id, label in enumerate(ALL_TAGS)}
label2id = {label:id for id, label in enumerate(ALL_TAGS)}

os.environ["WANDB_API_KEY"] = "f231c84384bcb70f042e7d3f1b4aa63a5b4cb893"

def train_loop_per_worker(config):
    model = AutoModelForSequenceClassification.from_pretrained(
        "bert-large-cased", num_labels=4, id2label=id2label, label2id=label2id
    )

    accuracy = evaluate.load("accuracy")

    def compute_metrics(eval_pred):
        predictions, labels = eval_pred
        predictions = np.argmax(predictions, axis=1)
        return accuracy.compute(predictions=predictions, references=labels)
    
    os.environ["WANDB_LOG_MODEL"] = "checkpoint"
    
    import wandb
    wandb.login(key=config["wandb_api_key"])

    training_args = TrainingArguments(
        output_dir="my_awesome_model",
        learning_rate=2e-5,
        per_device_train_batch_size=16,
        per_device_eval_batch_size=16,
        num_train_epochs=4,
        weight_decay=0.01,
        evaluation_strategy="epoch",
        save_strategy="epoch",
        logging_strategy="steps",
        load_best_model_at_end=True,
        push_to_hub=False,
        report_to="wandb",
        save_total_limit=2,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=processed_ds["train"],
        eval_dataset=processed_ds["test"],
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,
        callbacks=[AIRReportCallback()]
    )

    trainer.train()
    

##### Recommendations

- If you're using deep learning models like CNNs, Bert-based LLMs, etc., feel free to use a combination of Pytorch, HuggingFace, Pytorch Lightning, etc.

# 📦&nbsp; Training configuration

Goals by the end of this section:

- Integrate with the experiment tracking tool of your choice.
- Configure checkpointing.
- Configure auto-recovery on worker failures.
- Configure distributed checkpointing to happen with N workers.

In [8]:
# Code here
from ray.air.config import ScalingConfig, RunConfig, CheckpointConfig
from ray.train.torch import TorchTrainer

trainer = TorchTrainer(
    train_loop_per_worker=train_loop_per_worker,
    train_loop_config={"wandb_api_key": os.environ["WANDB_API_KEY"]},
    scaling_config=ScalingConfig(
        num_workers=2,
        use_gpu=True
    ),
    run_config=RunConfig(
        "cuj-torch-transformer",
        storage_path="/mnt/cluster_storage/ray_results",
        checkpoint_config=CheckpointConfig(
            num_to_keep=2,
            checkpoint_score_attribute="eval_accuracy",
            checkpoint_score_order="max"
        )
    )
)

# 🚀&nbsp; Model training

Goals by the end of this section:

- Launch your training job.
- Test that auto-recovery on failures actually works. See below.
- Test that you can resume a manually interrupted experiment.
- Test that you can resume training from a checkpoint that is stored as an artifact in your experiment tracking tool.
- Monitor plots on your experiment tracking tool UI.


### How to test auto-recovery on worker failures

Here's how to simulate a node failure (assuming you're running on an AWS cluster):

1. Restart your cluster and configure your worker nodes to have a special tag that allows them to terminate themselves. This under `Resources and instance config (advanced)` -> `Instance config`.

```
{
  "TagSpecifications": [{
    "ResourceType": "instance",
    "Tags": [{"Key": "chaos-test-name", "Value": "tune-chaos-test"}]
  }]
}
```

2. Start your training job and wait for it to progress a bit.
3. `ray list nodes` and get the NODE_ID of one of the **worker nodes.**
4. `python kill.py <insert-node-id>`
5. Make sure that your training recovers and continues successfully after a new node is brought up.


In [9]:
# Code here
result = trainer.fit()

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Snapshotting files: 100%|██████████| 3/3 [00:00<00:00, 72.02file/s]
2023-07-17 13:06:06,791	INFO worker.py:1452 -- Connecting to existing Ray cluster at address: 10.0.59.208:6379...
2023-07-17 13:06:06,798	INFO worker.py:1627 -- Connected to Ray cluster. View the dashboard at [1m[32mhttps://session-ni9bhp4mpadjuezqjeujyktdwe.i.anyscaleuserdata-staging.com [39m[22m
2023-07-17 13:06:06,801	INFO packaging.py:347 -- Pushing file package 'gcs://_ray_pkg_43d78dc5059154f4297f683a337a38b8.zip' (0.35MiB) to Ray cluster...
2023-07-17 13:06:06,803	INFO packaging.py:360 -- Successfully pushed file package 'gcs://_ray_pkg_43d78dc5059154f4297f683a337a38b8.zip'.
2023-07-17 13:06:06,870	INFO tune.py:226 -- Initializing Ray automatically. For cluster usage or custom Ray initialization, call `ray.init(...)` before `Trainer(...)`.


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


0,1
Current time:,2023-07-17 13:09:58
Running for:,00:03:51.12
Memory:,11.3/62.0 GiB

Trial name,status,loc,iter,total time (s),eval_loss,eval_accuracy,eval_runtime
TorchTrainer_5d770_00000,TERMINATED,10.0.59.208:159300,4,212.596,0.578191,0.774869,1.0552


[2m[36m(TrainTrainable pid=159300)[0m 2023-07-17 13:06:11.484646: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
[2m[36m(TrainTrainable pid=159300)[0m To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[2m[36m(TrainTrainable pid=159300)[0m 2023-07-17 13:06:11.645095: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
[2m[36m(TrainTrainable pid=159300)[0m 2023-07-17 13:06:12.390104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7:

[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m {'eval_loss': 1.0849497318267822, 'eval_accuracy': 0.581151832460733, 'eval_runtime': 1.0229, 'eval_samples_per_second': 186.717, 'eval_steps_per_second': 5.865, 'epoch': 1.0}


[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-24/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-24/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-24/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-24/pytorch_model.bin
100%|██████████| 6/6 [00:00<00:00,  6.24it/s][A[32m [repeated 13x across cluster][0m
[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m Model weights saved in my_awesome_model/checkpoint-24/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-24/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-24/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m tokenizer config file saved in my_awesome_model/checkpoint-24/tok

Trial name,date,done,epoch,eval_accuracy,eval_loss,eval_runtime,eval_samples_per_second,eval_steps_per_second,experiment_tag,hostname,iterations_since_restore,node_ip,pid,should_checkpoint,step,time_since_restore,time_this_iter_s,time_total_s,timestamp,training_iteration,trial_id
TorchTrainer_5d770_00000,2023-07-17_13-09-55,True,4,0.774869,0.578191,1.0552,181.016,5.686,0,ip-10-0-59-208,4,10.0.59.208,159300,True,96,212.596,43.9119,212.596,1689624595,4,5d770_00000


 26%|██▌       | 25/96 [00:47<10:02,  8.48s/it]
 32%|███▏      | 31/96 [00:58<01:53,  1.75s/it][32m [repeated 12x across cluster][0m
 39%|███▊      | 37/96 [01:03<00:56,  1.04it/s][32m [repeated 12x across cluster][0m
 45%|████▍     | 43/96 [01:03<00:46,  1.15it/s][32m [repeated 12x across cluster][0m
 50%|█████     | 48/96 [01:12<00:41,  1.17it/s]The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: title, id, created_on, description, input_text, tag. If title, id, created_on, description, input_text, tag are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m ***** Running Evaluation *****
[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m   Num examples = 191
[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m   Batch size = 16
 50%|█████     | 48/96 [01:07<00:41,  1.17it/s]The fol

[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m {'eval_loss': 0.778007447719574, 'eval_accuracy': 0.7172774869109948, 'eval_runtime': 1.029, 'eval_samples_per_second': 185.612, 'eval_steps_per_second': 5.831, 'epoch': 2.0}[32m [repeated 2x across cluster][0m


[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-48/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m tokenizer config file saved in my_awesome_model/checkpoint-48/tokenizer_config.json
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-48/special_tokens_map.json
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-48/special_tokens_map.json
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-48/special_tokens_map.json
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-48/special_tokens_map.json
100%|██████████| 6/6 [00:00<00:00,  6.18it/s][A[32m [repeated 13x across cluster][0m
[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m Special tokens file saved in my_awesome_model/checkpoint-48/special_tokens_map.json
[2m[36m(RayTrainWorker pid=15975

[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m {'eval_loss': 0.6882938742637634, 'eval_accuracy': 0.7172774869109948, 'eval_runtime': 1.0592, 'eval_samples_per_second': 180.323, 'eval_steps_per_second': 5.665, 'epoch': 3.0}[32m [repeated 2x across cluster][0m


[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-72/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m tokenizer config file saved in my_awesome_model/checkpoint-72/tokenizer_config.json
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-72/special_tokens_map.json
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-72/special_tokens_map.json
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-72/special_tokens_map.json
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-72/special_tokens_map.json
100%|██████████| 6/6 [00:00<00:00,  6.04it/s][A[32m [repeated 13x across cluster][0m
[2m[36m(RayTrainWorker pid=159756)[0m Special tokens file saved in my_awesome_model/checkpoint-72/special_tokens_map.json
[2m[36m(RayTrainWorker pid=159756)[0m Special 

[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m {'eval_loss': 0.5781914591789246, 'eval_accuracy': 0.774869109947644, 'eval_runtime': 1.0549, 'eval_samples_per_second': 181.062, 'eval_steps_per_second': 5.688, 'epoch': 4.0}[32m [repeated 2x across cluster][0m


[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-96/pytorch_model.bin
100%|██████████| 6/6 [00:00<00:00,  6.01it/s][A[32m [repeated 17x across cluster][0m
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-96/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-96/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-96/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-96/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-96/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m Model weights saved in my_awesome_model/checkpoint-96/pytorch_model.bin
[2m[36m(RayTrainWorker pid=159756)[0m tokenizer config file saved in my_awesome_model/checkpoint-96/tokenizer_config.j

[2m[36m(RayTrainWorker pid=65355, ip=10.0.48.171)[0m {'train_runtime': 186.7178, 'train_samples_per_second': 16.367, 'train_steps_per_second': 0.514, 'train_loss': 0.8317810694376627, 'epoch': 4.0}
[2m[36m(RayTrainWorker pid=159756)[0m {'train_runtime': 186.7178, 'train_samples_per_second': 16.367, 'train_steps_per_second': 0.514, 'train_loss': 0.8317810694376627, 'epoch': 4.0}


2023-07-17 13:09:58,112	INFO tune.py:1111 -- Total run time: 231.24 seconds (231.09 seconds for the tuning loop).


Result(
  metrics={'eval_loss': 0.5781914591789246, 'eval_accuracy': 0.774869109947644, 'eval_runtime': 1.0552, 'eval_samples_per_second': 181.016, 'eval_steps_per_second': 5.686, 'epoch': 4.0, 'step': 96, 'should_checkpoint': True, 'done': True, 'trial_id': '5d770_00000', 'experiment_tag': '0'},
  path='/mnt/cluster_storage/ray_results/cuj-torch-transformer/TorchTrainer_5d770_00000_0_2023-07-17_13-06-06',
  checkpoint=TransformersCheckpoint(local_path=/efs/workspaces/expwrk_6j8va8yrahtbn24ydlvvcjjgz3/cluster_storage/ray_results/cuj-torch-transformer/TorchTrainer_5d770_00000_0_2023-07-17_13-06-06/checkpoint_000003)
)

# ⚙️&nbsp; Hyperparameter tuning

Goals by the end of this section:

- Launch a tuning job searching over M training configurations.
- Get the best set of hyperparameters, the metrics, and the model checkpoint associated with it.

In [10]:
# Code here



# ⚖️&nbsp; Model evaluation and testing

Goals by the end of this section:

- Load the model from the best checkpoint of a past training run.
- Use this model to make predictions on the test set.
- Compute some model performance metrics and do manual testing of the model.


In [14]:
# Code here

# path = result.checkpoint.path
path = "/mnt/cluster_storage/ray_results/cuj-torch-transformer/TorchTrainer_5d770_00000_0_2023-07-17_13-06-06/checkpoint_000003"
model = AutoModelForSequenceClassification.from_pretrained(path)

accuracy = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return accuracy.compute(predictions=predictions, references=labels)
    
training_args = TrainingArguments(
    output_dir="my_awesome_model",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=4,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_strategy="steps",
    load_best_model_at_end=True,
    push_to_hub=False,
    report_to="wandb",
    save_total_limit=2,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=processed_ds["train"],
    eval_dataset=processed_ds["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    callbacks=[AIRReportCallback()]
)

trainer.evaluate()

The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and have been ignored: title, tag, input_text, id, created_on, description. If title, tag, input_text, id, created_on, description are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
***** Running Evaluation *****
  Num examples = 191
  Batch size = 16


Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[34m[1mwandb[0m: Currently logged in as: [33myunxuan[0m. Use [1m`wandb login --relogin`[0m to force relogin


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


{'eval_loss': 0.5714403986930847,
 'eval_accuracy': 0.774869109947644,
 'eval_runtime': 2.3761,
 'eval_samples_per_second': 80.384,
 'eval_steps_per_second': 5.05}

##### Recommendations

- Evaluate across [many granularities](https://madewithml.com/courses/mlops/evaluation/) (overall, per-class, interesting slices, etc.)
- Check that [behavioral checks](https://madewithml.com/courses/mlops/evaluation/#behavioral-testing) pass, regardless of the type of model.