# Inference Sample for ProtT5nv

Copyright (c) 2022, NVIDIA CORPORATION. Licensed under the Apache License, Version 2.0 (the "License") you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


### Prerequisite

- Linux OS
- Pascal, Volta, Turing, or an NVIDIA Ampere architecture-based GPU.
- NVIDIA Driver
- Docker

#### Import

Components for inferencing are part of the BioNeMo ProtT5nv source code. This notebook demonstrates the use of these components.

__`ProtT5nvInferenceWrapper`__ implements __`seq_to_embedding`__ function to obtain encoder embeddings for the input protein sequence in text format. 

Note that gRPC limits request size to 4MB.


In [1]:
from infer import ProtT5nvInferenceWrapper

import logging
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')

### Setup and Test Data

__`ProtT5nvInferenceWrapper`__ is an adaptor that allows interaction with inference service.

<u>Please note</u>: The batch size ot the number of sequences submitted for embedding inferencing (in other words, inference throughput) may be limited by the compute capacity of the inferencing node hosting the model. 

In [2]:
connection = ProtT5nvInferenceWrapper()

seqs = ['MSLKRKNIALIPAAGIGVRFGADKPKQYVEIGSKTVLEHVL', 'MIQSQINRNIRLDLADAILLSKAKKDLSFAEIADGTGLA']

### Sequence to Hidden States

__`seq_to_hiddens`__ queries the model to fetch the encoder hiddens states for the input protein sequence. `enc_mask` is returned with `hidden_states` and contains padding information  

In [3]:
hidden_states, enc_mask = connection.seq_to_hiddens(seqs)

In [4]:
hidden_states.shape

torch.Size([2, 41, 768])

In [5]:
enc_mask.shape

torch.Size([2, 41])

### Hidden States to Embedding

__`hiddens_to_embedding`__ computes embedding vector by averaging `hidden_states` 

In [6]:
embeddings = connection.hiddens_to_embedding(hidden_states, enc_mask)

In [7]:
embeddings.shape

torch.Size([2, 768])

### Sequence to Embedding

__`seq_to_embedding`__  queries the model to fetch the encoder hiddens states and computes embedding vector

In [8]:
embeddings = connection.seq_to_embedding(seqs)

In [9]:
embeddings.shape

torch.Size([2, 768])