In [None]:
# =============================================================
# Copyright © 2024 Intel Corporation
# 
# SPDX-License-Identifier: MIT
# =============================================================

### NOTE: Before starting this notebook, please pick a jupyter kernel from dropdown from `pytorch-cpu` or `pytorch-gpu`. Choose the kernel depending on the hardware available to you. The `pytorch-gpu` kernel will only work on Intel® Flex/Max GPUs.

# `Intel® Extension for PyTorch* Getting Started and Features` Sample



## About Intel® Extension for PyTorch

PyTorch* is a very popular framework for deep learning, while also compute-heavy package demanding perfromance optimizations. Intel and Facebook* having been collaborating to boost PyTorch* Performance for Intel hardware which includes [Intel GPUs](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu.html) and [Intel CPUs](https://www.intel.com/content/www/us/en/products/details/processors.html). The official PyTorch has been optimized using oneAPI Deep Neural Network Library (oneDNN) primitives by default.

To deliver the latest and greatest  optimizations, Intel® Optimizations for Pytorch offers accelerations beyond the stock Pytorch via **Intel® Extension for PyTorch***(IPEX). IPEX is a Python package to extend the official PyTorch with optimizations for extra performance boost on Intel hardware. Most of the optimizations will be included in stock PyTorch releases eventually, and the intention of the extension is to deliver up-to-date features and optimizations for PyTorch on Intel hardware, examples include AVX-512 Vector Neural Network Instructions (AVX512 VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX).

More detailed tutorials are available at [Intel® Extension for PyTorch* online document website](https://intel.github.io/intel-extension-for-pytorch/).

## Purpose
This sample code shows how to get started with Intel® Extension for PyTorch as well as how to use AutoMixedPrecision with Intel® Extension for PyTorch.

## Sample Table of Contents
1. [Intel® Extension for PyTorch* Getting Started Sample](#sec-gs)
2. [Intel® Extension for PyTorch* Auto Mixed Precision Sample](#sec-amp)
3. [Intel® Neural Compressor for quantization](#sec-inc)

<a id="sec-gs"></a>
## Getting Started

If you want to explore Intel® Extension for PyTorch, you just need to convert the model and input tensors to the extension device, then the extension will be enabled automatically. Take an example, the code as follows is a model without the extension.


**Please run this sample in the Intel® PyTorch & Quantization Jupyter Kernel environment.**

In [None]:
DEVICE=""

import intel_extension_for_pytorch as ipex

if ipex.xpu.is_available() == True:
    DEVICE="xpu"
else:
    DEVICE="cpu"

### PyTorch Model without Intel® Extension for PyTorch

In [None]:
import torch
import torchvision.models as models

model = models.resnet50(weights='ResNet50_Weights.DEFAULT')
model.eval()
data = torch.rand(128, 3, 224, 224)

if DEVICE=='xpu':
    model = model.to(DEVICE)
    data = data.to(DEVICE)

with torch.no_grad():
    model(data)

### PyTorch Model with Intel® Extension for PyTorch

You just need to transform the above python script with **a couple lines of code** as follows and then the extension will be enabled and accelerate the computation automatically:

In [None]:
import torch
import torchvision.models as models

import intel_extension_for_pytorch as ipex

model = models.resnet50(weights='ResNet50_Weights.DEFAULT')
model.eval()
data = torch.rand(128, 3, 224, 224)

####################IPEX code changes#############################
if DEVICE=='xpu':
    model = model.to(DEVICE, memory_format=torch.channels_last)
    data = data.to(DEVICE, memory_format=torch.channels_last)
else:
    model = model.to(memory_format=torch.channels_last)
    data = data.to(memory_format=torch.channels_last)

model = ipex.optimize(model, dtype=torch.float32)
##################################################################

with torch.no_grad():
    model(data)

<a id="sec-amp"></a>
## Automatically Mix Precision

In addition, Intel® Extension for PyTorch supports the mixed precision. It means that some operators of a model may run with Float32 and some other operators may run with BFloat16 or INT8 to accelerate inference workload.

Traditionally if you want to run a model with a low precision type, you need to convert the parameters and the input tensors to the low precision type manually. And if the model contains some operators that do not support the low precision type, then you have to convert back to Float32. Round after round until the model can run normally.

**IPEX can simplify this procedure. You just need to enable the auto-mix-precision as follows, then you can benefit from the low precision. Currently, the extension  supports BFloat16.**

### BFloat16 Example

In [None]:
import torch
import torchvision.models as models

import intel_extension_for_pytorch as ipex

model = models.resnet50(weights='ResNet50_Weights.DEFAULT')
model.eval()
data = torch.rand(128, 3, 224, 224, dtype=torch.bfloat16)

if DEVICE=='xpu':
    model = model.to(DEVICE, memory_format=torch.channels_last)
    data = data.to(DEVICE, memory_format=torch.channels_last)
else:
    model = model.to(memory_format=torch.channels_last)
    data = data.to(memory_format=torch.channels_last)

####################IPEX code changes#############################
model = ipex.optimize(model, dtype=torch.bfloat16)
##################################################################

with torch.no_grad(), torch.cpu.amp.autocast():
    model(data)

<a id="sec-inc"></a>
## Intel® Neural Compressor
In addition to Intel® Extension for PyTorch, this container also provides Intel® Neural Compressor(INC) package. You can use INC to perform model optimization like compressing the model size and optimizing the performance for CPUs and GPUs. INC provides quantization, pruning, and knowledge distillation to compress and optimize the model. More details about the Intel® Neural Compressor package can be found [in the documentation](https://intel.github.io/neural-compressor/latest/docs/source/get_started.html).

##### NOTE: Intel Neural Compressor does not yet suppot XPU with torchvision models.

In [None]:
from neural_compressor.data import DataLoader, Datasets
from neural_compressor.config import PostTrainingQuantConfig
from neural_compressor.quantization import fit

import torch
import torchvision.models as models
import intel_extension_for_pytorch as ipex

DEVICE="cpu" #INC does not work with GPU as of now.

dataset = Datasets('pytorch')['dummy'](shape=(1, 3, 224, 224))
# Built-in calibration dataloader and evaluation dataloader for Quantization.
dataloader = DataLoader(framework='pytorch', dataset=dataset)
# Post Training Quantization Config
config = PostTrainingQuantConfig(backend='ipex', device=DEVICE)
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)

if DEVICE=='xpu':
    model = model.to(DEVICE, memory_format=torch.channels_last)
    
# Just call fit to do quantization.
q_model = fit(model=model,
              conf=config,
              calib_dataloader=dataloader)

In [None]:
print('[CODE_SAMPLE_COMPLETED_SUCCESFULLY]')