# Quick Get Started Notebook of Intel® Neural Compressor for Pytorch


This notebook is designed to provide an easy-to-follow guide for getting started with the [Intel® Neural Compressor](https://github.com/intel/neural-compressor) (INC) library for [pytorch](https://github.com/pytorch/pytorch) framework.

In the following sections, we are going to use a DistilBert model fine-tuned on MRPC as an example to show how to apply post-training quantization on [transformers](https://github.com/huggingface/transformers) models using the INC library.


The main objectives of this notebook are:

1. Prerequisite: Prepare necessary environment, model and dataset.
2. Quantization with INC: Walk through the step-by-step process of applying post-training quantization.
3. Benchmark with INC: Evaluate the performance of the FP32 and INT8 models.


## 1. Prerequisite

### 1.1 Environment

If you have Jupyter Notebook, you may directly run this notebook. We will use pip to install or upgrade [neural-compressor](https://github.com/intel/neural-compressor), [pytorch](https://github.com/pytorch/pytorch) and other required packages.

Otherwise, you can setup a new environment. First, we install [Anaconda](https://www.anaconda.com/distribution/). Then open an Anaconda prompt window and run the following commands:

```shell
conda create -n inc_notebook python==3.8
conda activate inc_notebook
pip install jupyter
jupyter notebook
```
The last command will launch Jupyter Notebook and we can open this notebook in browser to continue.

Then, let's install necessary packages.

In [None]:
# install neural-compressor from source
import sys
!git clone https://github.com/intel/neural-compressor.git
%cd ./neural-compressor
!{sys.executable} -m pip install -r requirements.txt
!{sys.executable} setup.py install
%cd ..

# or install stable basic version from pypi
!{sys.executable} -m pip install neural-compressor


In [None]:
# install other packages used in this notebook.
!{sys.executable} -m pip install -r requirements.txt


### 1.2 Load Dataset

The General Language Understanding Evaluation (GLUE) benchmark is a group of nine classification tasks on sentences or pairs of sentences which are:

- [CoLA](https://nyu-mll.github.io/CoLA/) (Corpus of Linguistic Acceptability) Determine if a sentence is grammatically correct or not.
- [MNLI](https://arxiv.org/abs/1704.05426) (Multi-Genre Natural Language Inference) Determine if a sentence entails, contradicts or is unrelated to a given hypothesis. This dataset has two versions, one with the validation and test set coming from the same distribution, another called mismatched where the validation and test use out-of-domain data.
- [MRPC](https://www.microsoft.com/en-us/download/details.aspx?id=52398) (Microsoft Research Paraphrase Corpus) Determine if two sentences are paraphrases from one another or not.
- [QNLI](https://rajpurkar.github.io/SQuAD-explorer/) (Question-answering Natural Language Inference) Determine if the answer to a question is in the second sentence or not. This dataset is built from the SQuAD dataset.
- [QQP](https://data.quora.com/First-Quora-Dataset-Release-Question-Pairs) (Quora Question Pairs2) Determine if two questions are semantically equivalent or not.
- [RTE](https://aclweb.org/aclwiki/Recognizing_Textual_Entailment) (Recognizing Textual Entailment) Determine if a sentence entails a given hypothesis or not.
- [SST-2](https://nlp.stanford.edu/sentiment/index.html) (Stanford Sentiment Treebank) Determine if the sentence has a positive or negative sentiment.
- [STS-B](http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark) (Semantic Textual Similarity Benchmark) Determine the similarity of two sentences with a score from 1 to 5.
- [WNLI](https://cs.nyu.edu/faculty/davise/papers/WinogradSchemas/WS.html) (Winograd Natural Language Inference) Determine if a sentence with an anonymous pronoun and a sentence with this pronoun replaced are entailed or not. This dataset is built from the Winograd Schema Challenge dataset.

Here, we use MRPC task. We download and load the required dataset from hub.

In [None]:
import datasets
import numpy as np
import transformers
from datasets import load_dataset, load_metric
from transformers import (
    AutoConfig,
    AutoModelForSequenceClassification,
    AutoTokenizer,
    EvalPrediction,
    Trainer,
)

In [None]:
task_name = 'mrpc'
raw_datasets = load_dataset("glue", task_name)
label_list = raw_datasets["train"].features["label"].names
num_labels = len(label_list)

### 1.3 Prepare Model
Download the pretrained model [textattack/distilbert-base-uncased-MRPC](https://huggingface.co/textattack/distilbert-base-uncased-MRPC) to a pytorch model.

In [None]:
model_name = 'textattack/distilbert-base-uncased-MRPC'

config = AutoConfig.from_pretrained(
    model_name,
    num_labels=num_labels,
    finetuning_task=task_name,
    use_auth_token=None,
)

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    use_auth_token=None,
)

model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    from_tf=False,
    config=config,
    use_auth_token=None,
)

### 1.4 Dataset Preprocessing
We need to preprocess the raw dataset.

In [None]:
sentence1_key, sentence2_key = ("sentence1", "sentence2")
padding = "max_length"
label_to_id = None
max_seq_length = 128

def preprocess_function(examples):
    args = (
        (examples[sentence1_key], examples[sentence2_key])
    )
    result = tokenizer(*args, padding=padding, max_length=max_seq_length, truncation=True)
    return result

raw_datasets = raw_datasets.map(preprocess_function, batched=True)

## 2. Quantization with Intel® Neural Compressor

### 2.1 Define metric, evaluate function, and dataloader

In this part, we define a GLUE metirc and use it to generate an evaluate function for INC.

Refer to doc [metric.md](https://github.com/intel/neural-compressor/blob/master/docs/source/metric.md#build-custom-metric-with-python-api) for how to build your own metric.
Refer to doc [dataset.md](https://github.com/intel/neural-compressor/blob/master/docs/source/dataset.md#user-specific-dataset) and [dataloader.md](https://github.com/intel/neural-compressor/blob/master/docs/source/dataloader.md#build-custom-dataloader-with-python-apiapi) for how to build your own dataset and dataloader.

In [None]:
eval_dataset = raw_datasets["validation"]
metric = load_metric("glue", task_name)
data_collator = None

def compute_metrics(p: EvalPrediction):
    preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
    preds = np.argmax(preds, axis=1)
    result = metric.compute(predictions=preds, references=p.label_ids)
    if len(result) > 1:
        result["combined_score"] = np.mean(list(result.values())).item()
    return result

# Initialize our Trainer
trainer = Trainer(
    model=model,
    train_dataset=None,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

eval_dataloader = trainer.get_eval_dataloader()

# for transformers 4.31.0: accelerate dataloader
# please use the code below to avoid error 
if eval_dataloader.batch_size is None:
    def _build_inc_dataloader(dataloader):
        class INCDataLoader:
            __iter__ = dataloader.__iter__
            def __init__(self) -> None:
                self.dataloader = dataloader
                self.batch_size = dataloader.total_batch_size
        return INCDataLoader()
    eval_dataloader = _build_inc_dataloader(eval_dataloader)
batch_size = eval_dataloader.batch_size

def take_eval_steps(model, trainer, save_metrics=False):
    trainer.model = model
    metrics = trainer.evaluate()
    bert_task_acc_keys = ['eval_f1', 'eval_accuracy', 'eval_matthews_correlation',
                            'eval_pearson', 'eval_mcc', 'eval_spearmanr']
    for key in bert_task_acc_keys:
        if key in metrics.keys():
            throughput = metrics.get("eval_samples_per_second")
            print('Batch size = %d' % batch_size)
            print("Finally Eval {} Accuracy: {}".format(key, metrics[key]))
            print("Latency: %.3f ms" % (1000 / throughput))
            print("Throughput: {} samples/sec".format(throughput))
            return metrics[key]
    assert False, "No metric returned, Please check inference metric!"

def eval_func(model):
    return take_eval_steps(model, trainer)

### 2.2 Run Quantization

So far, we can finally start to quantize the model. 

To start, we need to set the configuration for post-training quantization using `PostTrainingQuantConfig` class. Once the configuration is set, we can proceed to the next step by calling the `quantization.fit()` function. This function performs the quantization process on the model and will return the best quantized model.

In [None]:
from neural_compressor.quantization import fit
from neural_compressor.config import PostTrainingQuantConfig, TuningCriterion
tuning_criterion = TuningCriterion(max_trials=600)
conf = PostTrainingQuantConfig(approach="static", tuning_criterion=tuning_criterion)
q_model = fit(model, conf=conf, calib_dataloader=eval_dataloader, eval_func=eval_func)
q_model.save("./saved_results")

## 3. Benchmark with Intel® Neural Compressor

INC provides a benchmark feature to measure the model performance with the objective settings.

In [None]:
# fp32 benchmark
!{sys.executable} benchmark.py 2>&1|tee fp32_benchmark.log

# int8 benchmark
!{sys.executable} benchmark.py --input_model saved_results 2>&1|tee int8_benchmark.log
