# Packing a model as a HF model

A practical example of repacking [Vectara's hallucination model](https://huggingface.co/vectara/hallucination_evaluation_model)

Quick intro, a HF model is a folder containing all the necessary code and checkpoint to run the model.
- The weights of the model
- The tokenizer
- All the configuration files
- The python files needed to perform inference correctly (If you are doing something non standard, etc...)

In [1]:
from transformers import AutoModelForSequenceClassification

vectara_model = AutoModelForSequenceClassification.from_pretrained("vectara/hallucination_evaluation_model", trust_remote_code=True)

You are using a model of type HHEMv2Config to instantiate a model of type HHEMv2. This is not supported for all configurations of models and can yield errors.


First thing, the model config and model type don't match. This is probably because the model was trained using a config and the repackaged and the `model_type` got renamed. It's probably not an issue per-se, but let's fix that.

In [2]:
vectara_model.config.model_type

'HHEMv2Config'

This is because in the `config.json` file the `model_type="HHEMv2Config"`

In [3]:
vectara_model.config

HHEMv2Config {
  "_attn_implementation_autoset": true,
  "_name_or_path": "vectara/hallucination_evaluation_model",
  "architectures": [
    "HHEMv2ForSequenceClassification"
  ],
  "auto_map": {
    "AutoConfig": "vectara/hallucination_evaluation_model--configuration_hhem_v2.HHEMv2Config",
    "AutoModelForSequenceClassification": "vectara/hallucination_evaluation_model--modeling_hhem_v2.HHEMv2ForSequenceClassification"
  },
  "id2label": {
    "0": "hallucinated",
    "1": "consistent"
  },
  "label2id": null,
  "model_type": "HHEMv2",
  "torch_dtype": "float32",
  "transformers_version": "4.48.1"
}

If we save the model again, the config will be updated.

In [4]:
vectara_model.save_pretrained("weave_models/vectara")

no more warnings

In [5]:
vectara_model = AutoModelForSequenceClassification.from_pretrained("weave_models/vectara", trust_remote_code=True)
print(vectara_model.config.model_type)

HHEMv2


## Model details

Let's load the model again and check some of the details

In [6]:
from transformers import AutoModelForSequenceClassification

vectara_model = AutoModelForSequenceClassification.from_pretrained("weave_models/vectara", trust_remote_code=True)

vectara_model

HHEMv2ForSequenceClassification(
  (t5): T5ForTokenClassification(
    (transformer): T5EncoderModel(
      (shared): Embedding(32128, 768)
      (encoder): T5Stack(
        (embed_tokens): Embedding(32128, 768)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=768, out_features=768, bias=False)
                  (k): Linear(in_features=768, out_features=768, bias=False)
                  (v): Linear(in_features=768, out_features=768, bias=False)
                  (o): Linear(in_features=768, out_features=768, bias=False)
                  (relative_attention_bias): Embedding(32, 12)
                )
                (layer_norm): T5LayerNorm()
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0

We can see that the model is a `T5ForTokenClassification` model, we `out_features` is 2, which is the number of classes in the model.

Another thing to note is that the `tokenizer` is loaded on the model's init! This is not recommended at all.
> It also has a typo: `tokenzier` instead of `tokenizer`

In [7]:
vectara_model.tokenzier

T5TokenizerFast(name_or_path='google/flan-t5-base', vocab_size=32100, model_max_length=512, is_fast=True, padding_side='right', truncation_side='right', special_tokens={'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>', 'additional_special_tokens': ['<extra_id_0>', '<extra_id_1>', '<extra_id_2>', '<extra_id_3>', '<extra_id_4>', '<extra_id_5>', '<extra_id_6>', '<extra_id_7>', '<extra_id_8>', '<extra_id_9>', '<extra_id_10>', '<extra_id_11>', '<extra_id_12>', '<extra_id_13>', '<extra_id_14>', '<extra_id_15>', '<extra_id_16>', '<extra_id_17>', '<extra_id_18>', '<extra_id_19>', '<extra_id_20>', '<extra_id_21>', '<extra_id_22>', '<extra_id_23>', '<extra_id_24>', '<extra_id_25>', '<extra_id_26>', '<extra_id_27>', '<extra_id_28>', '<extra_id_29>', '<extra_id_30>', '<extra_id_31>', '<extra_id_32>', '<extra_id_33>', '<extra_id_34>', '<extra_id_35>', '<extra_id_36>', '<extra_id_37>', '<extra_id_38>', '<extra_id_39>', '<extra_id_40>', '<extra_id_41>', '<extra_id_42>', '<extra_id_43>'

It is also a standard `T5` tokenizer. One thing we can do, is save this tokenizer with the model as well.

In [8]:
vectara_model.tokenzier.save_pretrained("weave_models/vectara")

('weave_models/vectara/tokenizer_config.json',
 'weave_models/vectara/special_tokens_map.json',
 'weave_models/vectara/spiece.model',
 'weave_models/vectara/added_tokens.json',
 'weave_models/vectara/tokenizer.json')

If we check the Readme on the file, it is recommended to use the model using this custom `predict` function built-in with the model.

In [9]:
vectara_model.predict??

[0;31mSignature:[0m [0mvectara_model[0m[0;34m.[0m[0mpredict[0m[0;34m([0m[0mtext_pairs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mSource:[0m   
    [0;32mdef[0m [0mpredict[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mtext_pairs[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0mtokenizer[0m [0;34m=[0m [0mself[0m[0;34m.[0m[0mtokenzier[0m[0;34m[0m
[0;34m[0m        [0mpair_dict[0m [0;34m=[0m [0;34m[[0m[0;34m{[0m[0;34m'text1'[0m[0;34m:[0m [0mpair[0m[0;34m[[0m[0;36m0[0m[0;34m][0m[0;34m,[0m [0;34m'text2'[0m[0;34m:[0m [0mpair[0m[0;34m[[0m[0;36m1[0m[0;34m][0m[0;34m}[0m [0;32mfor[0m [0mpair[0m [0;32min[0m [0mtext_pairs[0m[0;34m][0m[0;34m[0m
[0;34m[0m        [0minputs[0m [0;34m=[0m [0mtokenizer[0m[0;34m([0m[0;34m[0m
[0;34m[0m            [0;34m[[0m[0mself[0m[0;34m.[0m[0mprompt[0m[0;34m.[0m[0mformat[0m[0;34m([0m[0;34m**[0m[0mpair[0m[0;34m)

As you can see, the `predict` function is custom text `pair` input classifier, that applies a prompt template and then uses the CLS token as the logit to use for classification.

We are going to repack this as a transformers pipeline. But first, let's actually extract the underlying `T5` model and save it instead, this will override the `HemmV2` model that is loaded on the `AutoModelForSequenceClassification` model.

In [10]:
vectara_model.t5.save_pretrained("weave_models/vectara")

Now we can load the model as a `T5ForTokenClassification` model. No more custom model!

In [11]:
from transformers import T5ForTokenClassification, AutoTokenizer

t5_model = T5ForTokenClassification.from_pretrained("weave_models/vectara")
tokenizer = AutoTokenizer.from_pretrained("weave_models/vectara")

## Writing a custom pipeline

Let's create a custom pipeline to deal with pairs of text. We have multiple ways of doing this, but let's do it in a way that is compatible with the `T5ForTokenClassification` model and keep it simple, by using the `Pipeline` class and predicting directly on tuples of text.

In [12]:
from transformers import AutoTokenizer, Pipeline
import torch

class PairTextClassificationPipeline(Pipeline):
    def __init__(self, model, tokenizer=None, **kwargs):
        # Initialize tokenizer first
        if tokenizer is None:
            tokenizer = AutoTokenizer.from_pretrained(model.config._name_or_path)
        # Make sure we store the tokenizer before calling super().__init__
        self.tokenizer = tokenizer
        super().__init__(model=model, tokenizer=tokenizer, **kwargs)
        self.prompt = "<pad> Determine if the hypothesis is true given the premise?\n\nPremise: {text1}\n\nHypothesis: {text2}"
        
    def _sanitize_parameters(self, **kwargs):
        preprocess_kwargs = {}
        return preprocess_kwargs, {}, {}

    def preprocess(self, inputs):
        # Expect inputs to be list of (Premise, Hypothesis) tuples
        pair_dict = {'text1': inputs[0], 'text2': inputs[1]}
        formatted_prompt = self.prompt.format(**pair_dict)
        model_inputs = self.tokenizer(
            formatted_prompt,
            return_tensors='pt', 
            padding=True
        )
        return model_inputs

    def _forward(self, model_inputs):
        model_outputs = self.model(**model_inputs)
        return model_outputs

    def postprocess(self, model_outputs):
        logits = model_outputs.logits
        logits = logits[:, 0, :] # tok_cls
        transformed_probs = torch.softmax(logits, dim=-1)
        raw_scores = transformed_probs[:, 1] # probability of class 1
        return raw_scores.item()


We can manually test the pipeline:

In [13]:
pipe = PairTextClassificationPipeline(t5_model, tokenizer=tokenizer)

Device set to use mps:0


In [14]:
pairs = [ # Test data, List[Tuple[str, str]]
    ("The capital of France is Berlin.", "The capital of France is Paris."),
    ('I am in California', 'I am in United States.'),
    ('I am in United States', 'I am in California.'),
    ("A person on a horse jumps over a broken down airplane.", "A person is outdoors, on a horse."),
    ("A boy is jumping on skateboard in the middle of a red bridge.", "The boy skates down the sidewalk on a red bridge"),
    ("A man with blond-hair, and a brown shirt drinking out of a public water fountain.", "A blond man wearing a brown shirt is reading a book."),
    ("Mark Wahlberg was a fan of Manny.", "Manny was a fan of Mark Wahlberg.")
]

Let's compare the GT values

In [15]:
scores = pipe(pairs)

# the ground truth scores
gt = [0.011061512865126133, 0.6473632454872131, 0.1290171593427658, 0.8969419002532959, 0.18462494015693665, 0.005031010136008263, 0.05432349815964699]

assert all(abs(s - g) < 1e-5 for s, g in zip(scores, gt))

Ok, but this pipeline is not yet saved within the model. Let's save the `PairTextClassificationPipeline` to a file next to this notebook `custom_pipeline.py`. If you open `config.json` you will see that this entry get's added:
```json
  "custom_pipelines": {
    "pair-classification": {
      "impl": "custom_pipeline.PairTextClassificationPipeline",
      "pt": [
        "AutoModelForTokenClassification"
      ],
      "tf": []
    }
  },
  ```

In [16]:
from custom_pipeline import PairTextClassificationPipeline
from transformers.pipelines import PIPELINE_REGISTRY
from transformers import AutoModelForTokenClassification


# this adds the pipeline to the model =)
PIPELINE_REGISTRY.register_pipeline(
    "pair-classification",
    pipeline_class=PairTextClassificationPipeline,
    pt_model=AutoModelForTokenClassification,
)

And now if you save, the custom_pipeline.py is also being saved!

In [17]:
pipe.save_pretrained("weave_models/hallu_scorer")
pipe.push_to_hub("tcapelle/hallu_scorer") # optional

No files have been modified since last commit. Skipping to prevent empty commit.


CommitInfo(commit_url='https://huggingface.co/tcapelle/hallu_scorer/commit/7da223761c65862d2c5222c19b97b66a1a50824a', commit_message='Upload PairTextClassificationPipeline', commit_description='', oid='7da223761c65862d2c5222c19b97b66a1a50824a', pr_url=None, repo_url=RepoUrl('https://huggingface.co/tcapelle/hallu_scorer', endpoint='https://huggingface.co', repo_type='model', repo_id='tcapelle/hallu_scorer'), pr_revision=None, pr_num=None)

# Loading back and using the model

In [19]:
from transformers import pipeline

pipe = pipeline("pair-classification", model="tcapelle/hallu_scorer", trust_remote_code=True)

pipe(("The capital of France is Berlin.", "The capital of France is Paris."))

Device set to use mps:0


0.01106148399412632