# LLM Component
Trying out the Hugging Face LLM model to use it as a Open-Ended Task Decorder.

In [1]:
from pycparser.ply.yacc import token
from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

The Bart LLM is mainly used to generate text from given tokens or text. For the LLM task decoder of my Vision LLM it would be better to use a model that can generate a description or an answer based on a certain instruction. We can choose between the T5 or the Bert model. We will use the T5 model because it is specialized in giving descriptions and giving open-ended answers.

In [2]:
tokenizer = T5Tokenizer.from_pretrained('t5-base')

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [3]:
bart_model = T5ForConditionalGeneration.from_pretrained('t5-base')

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

The tokenizer converts the textual format to tokens, a format the bart model can understand. 
The bart_model is the actual neural network that recieves the tokenized input and generates an output.

Lets try with a basic query:

In [4]:
text_input = "What is the most visited city in the world"
input_ids = tokenizer(text_input, return_tensors="pt").input_ids
input_ids

tensor([[ 363,   19,    8,  167, 5251,  690,   16,    8,  296,    1]])

In [5]:
output_ids = bart_model.generate(input_ids, max_length=50, num_beams=4, early_stopping=True)
output_ids

tensor([[    0, 32099,     8,   167,  5251,   690,    16,     8,   296,    58,
           363,    19,     8,   167,  5251,   690,    16,     8,   296,    58,
             1]])

In [8]:
text_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
text_output

'the most visited city in the world? What is the most visited city in the world?'

This is obvioudly not the answer we wanted, we probably need to fine-tune the model to get the desired output.

In [25]:
def generate_answer(text_input):
    input_ids = tokenizer(text_input, return_tensors="pt").input_ids
    output_ids = bart_model.generate(input_ids, max_length=50, num_beams=4, early_stopping=True)
    text_output = tokenizer.decode(output_ids[0], skip_special_tokens=True)
    return text_output

In [44]:
question = "question: What is the capital of Egypt?"
generate_answer(question)

'Cairo'

Given the complexity of the tasks my model will have to perform eventually, it will be better to use gpt-4 to generate the answers.

In [76]:
import openai
import os

In [77]:
# set root path and the .env as the evn file
from dotenv import load_dotenv
load_dotenv()
#%

True

In [78]:
api = os.getenv('OPENAI_API_KEY')
api

'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'

In [None]:
from openai import OpenAI
openai.api_key = api

model = "gpt-4o"

In [83]:
question = "What are the top 3 visited cities in Europe?"  # Example question

client = OpenAI(api_key=api)


response = client.chat.completions.create(
  model=model,
  messages=[
    {"role": "system", "content": "You are a helpful assistant that helps me with my questions."},
    {"role": "user", "content": question}
  ]
)

# Print the response
print("Assistant: " + response.choices[0].message.content)

Assistant: The top three most visited cities in Europe often include Paris, London, and Rome. These cities are renowned for their rich history, culture, cuisine, and numerous iconic landmarks that attract millions of tourists each year. However, exact rankings can fluctuate slightly depending on the specific year and the criteria used to measure tourism (such as international arrivals, overnight stays, or other metrics).


This is a much better answer than the one we got from the T5 model. The GPT-4 model is much better at generating answers to open-ended questions. This is the model we will use for the LLM task decoder.