## Loading the sample data

In [1]:
data = ["LTIMindtree Q2FY24: Show of strength. Good revenue growth and resilient margin performance",
        "The company expects furloughs to be more pronounced in Q3 and it is guiding to a very weak quarter, with revenue decline between 1.5 percent and 3.5 percent",
        "Arkam Ventures is also an investor in Jai Kisan, one of India’s fastest-growing rural fintech platforms for farmers and retailers, and Jumbotail, India’s leading B2B food and grocery marketplace and retail platform",
       ]
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

from transformers import AutoTokenizer
tokenizer =  AutoTokenizer.from_pretrained(model_name)
input = tokenizer(data, padding=True, truncation=True, return_tensors="pt")

## AutoModel Class

Class `AutoModel` is generally used to instantiate any model from a checkpoint. However, to load a model for a specific task, there are several variants of `AutoModel` class is defined in the `transformers` library. Some of them are:
* `AutoModelForCausalLM`
* `AutoModelForQuestionAnswering`
* `AutoModelForSequenceClassification`
* `AutoModelForTokenClassification`, and others.

`AutoModel` is generally designed to retrieve the hidden states, whereas the other ones are designed for a specific task.

In [2]:
from transformers import AutoModel

In [3]:
classifier = AutoModel.from_pretrained(model_name)

In [4]:
output = classifier(**input)

In [5]:
output

BaseModelOutput(last_hidden_state=tensor([[[ 0.5171, -0.2222,  0.3759,  ...,  0.2316,  0.8865, -0.5954],
         [ 0.2151,  0.0263,  0.9891,  ...,  0.0113,  0.1795,  0.3608],
         [ 0.1851,  0.0462,  0.7588,  ..., -0.4426,  0.4743,  0.0501],
         ...,
         [ 0.6706, -0.2922,  0.0375,  ...,  0.3897,  0.7996, -0.5716],
         [ 0.3803, -0.2894,  0.2847,  ...,  0.7460,  0.8029, -0.3096],
         [ 0.5641, -0.6835,  0.2739,  ...,  0.7024,  0.5494, -0.4684]],

        [[-1.2077,  0.3169,  0.1434,  ..., -0.0836, -0.4359, -0.3140],
         [-1.1384,  0.4225, -0.1258,  ...,  0.0325, -0.0875, -0.3803],
         [-0.6221,  0.7449, -0.1054,  ..., -0.3525,  0.0835,  0.0054],
         ...,
         [-1.0930,  0.3044,  0.2513,  ..., -0.1011, -0.4593, -0.2472],
         [-1.0747,  0.3590,  0.2335,  ..., -0.0922, -0.4178, -0.2995],
         [-1.1595,  0.3961,  0.0261,  ..., -0.2430, -0.3581, -0.2027]],

        [[ 0.1207, -0.0831,  0.2845,  ..., -0.0777,  0.6180, -0.7339],
         [-

In [6]:
output.last_hidden_state.shape

torch.Size([3, 50, 768])

**AutoModel** is generally used for retrieving the hidden state which can be used as feature. 

For Sentiment Analysis, it is better to use **AutoModelForSequenceClassification**. This comes with Sequence Classification heads.

In [7]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [8]:
output = model(**input)
output

SequenceClassifierOutput(loss=None, logits=tensor([[-3.7354,  3.9795],
        [ 4.2851, -3.5077],
        [-3.0048,  3.1662]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

Returned logits are specific to `Binary Classification`. For every example two values are given. One for each class.

## Creating the Model

### Loading the model for Training

#### Loading the config

In [9]:
from transformers import DistilBertConfig

config = DistilBertConfig()

In [10]:
config

DistilBertConfig {
  "activation": "gelu",
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "hidden_dim": 3072,
  "initializer_range": 0.02,
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "transformers_version": "4.34.1",
  "vocab_size": 30522
}

The configurtions contains parameters used to build the model.

#### Loading the model using the config

This loads the model with random parameter initialization. This can further be used for training the model specific to any task.

In [11]:
from transformers import DistilBertModel

In [12]:
model = DistilBertModel(config)

In [13]:
model

DistilBertModel(
  (embeddings): Embeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (transformer): Transformer(
    (layer): ModuleList(
      (0-5): 6 x TransformerBlock(
        (attention): MultiHeadSelfAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (q_lin): Linear(in_features=768, out_features=768, bias=True)
          (k_lin): Linear(in_features=768, out_features=768, bias=True)
          (v_lin): Linear(in_features=768, out_features=768, bias=True)
          (out_lin): Linear(in_features=768, out_features=768, bias=True)
        )
        (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        (ffn): FFN(
          (dropout): Dropout(p=0.1, inplace=False)
          (lin1): Linear(in_features=768, out_features=3072, bias=True)
          (lin2): Li

### Loading the model for inference

For inference the model requires pre-trained weights (checkpoint). Either one can train the model on their own dataset and load from that checkpoint or can use the one available on huggingface hub using `from_pretrained()` method.

In [14]:
model = DistilBertModel.from_pretrained('distilbert-base-uncased')

This will download the model to a cache dir `~/.cache/huggingface/transformers`. To download the model to a specific directory use the parameter `cache_dir`.

In [15]:
text = "Replace me by any text you'd like."
tokenizer =  AutoTokenizer.from_pretrained('distilbert-base-uncased')
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)

In [16]:
output

BaseModelOutput(last_hidden_state=tensor([[[ 4.4096e-04, -2.6241e-01, -1.0192e-01,  ..., -6.2764e-02,
           2.7584e-01,  3.7014e-01],
         [ 7.2233e-01,  1.6449e-01,  4.0025e-01,  ...,  1.9161e-01,
           4.0458e-01, -5.8094e-02],
         [ 2.8198e-01, -1.7430e-01,  3.9076e-02,  ...,  2.7681e-02,
           1.1886e-01,  9.1439e-01],
         ...,
         [ 6.8016e-01,  7.9712e-02,  8.3603e-01,  ..., -4.8959e-01,
          -2.5017e-01, -2.3519e-01],
         [ 3.8105e-02, -8.1751e-01, -3.4076e-01,  ...,  4.4815e-01,
           9.6725e-02, -2.0311e-01],
         [ 3.5750e-01,  1.9968e-01,  1.7437e-01,  ...,  1.5028e-01,
          -2.3665e-01,  5.4391e-02]]], grad_fn=<NativeLayerNormBackward0>), hidden_states=None, attentions=None)

#### Using the model in pipeline 

In [17]:
from transformers import pipeline, DistilBertForMaskedLM
model = DistilBertForMaskedLM.from_pretrained('distilbert-base-uncased')

In [18]:
unmasker = pipeline(task='fill-mask', model=model, tokenizer=tokenizer)

In [19]:
unmasker("Replace me by any [MASK] you'd like.")

[{'score': 0.04683324694633484,
  'token': 2711,
  'token_str': 'person',
  'sequence': "replace me by any person you'd like."},
 {'score': 0.03321794793009758,
  'token': 2171,
  'token_str': 'name',
  'sequence': "replace me by any name you'd like."},
 {'score': 0.023977328091859818,
  'token': 2450,
  'token_str': 'woman',
  'sequence': "replace me by any woman you'd like."},
 {'score': 0.021561430767178535,
  'token': 8016,
  'token_str': 'excuse',
  'sequence': "replace me by any excuse you'd like."},
 {'score': 0.017540020868182182,
  'token': 2158,
  'token_str': 'man',
  'sequence': "replace me by any man you'd like."}]

### Saving the model

Here, saving the checkpoint downloaded from the hub. One can save the model similarily after training it.

In [20]:
model.save_pretrained('./../../../hf_models/')