# Base LLM vs Fine-tuned LLM

<div style="background-color:#D9EEFF;color:black;padding:2%;">
<h2>Enunciado del caso práctico</h2>

En este caso práctico, se propone al alumno la implementación de un modelo base que haya sido pre-entrenado (se recomienda T5) y su comparación con el mismo modelo después de aplicarle Fine-tuning (se recomienda Flan-T5)

</div>

# Resolución del caso práctico

## 0. Instalación de librerías externas

In [1]:
!pip install transformers
!pip install sentencepiece
!pip install accelerate

Collecting accelerate
  Downloading accelerate-0.31.0-py3-none-any.whl (309 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch>=1.10.0->accelerate)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.w

## 1. Selección de un LLM base pre-entrenado

Tal y como hemos visto en secciones anteriores, existe una gran variedad de LLMs base que podemos utilizar: https://huggingface.co/models

En este caso práctico, vamos a hacer del modelo base T5 (https://huggingface.co/t5-base).

Este LLM esta compuesto por 220 millones de parámetros y ha sido pre-entrenado en número elevado de conjuntos de datos: https://huggingface.co/t5-base#training-details

### Lectura del modelo y del tokenizador

In [2]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Importamos el tokenizador
tokenizer_T5 = T5Tokenizer.from_pretrained("t5-base")

# Importamos el modelo pre-entrenado
model_T5 = T5ForConditionalGeneration.from_pretrained("t5-base", device_map="auto")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]



config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

### Generación de texto

In [3]:
prompt = "My name"

In [4]:
prompt = "Today is"

In [5]:
prompt = "Me llamo"

In [6]:
text = """The Second World War (also written World War II)1 was a global military \
conflict that took place between 1939 and 1945. It involved most of the world's \
nations - including all the major powers, as well as virtually all European nations \
- grouped into two opposing military alliances: the Allies on the one hand, and the \
Axis Powers on the other. It was the greatest war in history, with more than 100 \
million military personnel mobilized and a state of total war in which the major \
contenders devoted all their economic, military and scientific capabilities to the \
service of the war effort, blurring the distinction between civilian and military \
resources."""

prompt = f"Summarize: {text}"

In [7]:
prompt = "What do you think of Mars?"

In [8]:
prompt = "Translate to Spanish: 'How are you?'"

In [9]:
review = """Love these plugs, have a few now. We use them to plug in lights and \
set timers to turn them on and off via a phone app. Easy to use and linked to \
the internet and apps. Good value for money."""

prompt = f"Sentiment? Review: {review}"

In [10]:
review1 = """Love these plugs, have a few now. We use them to plug in lights and \
set timers to turn them on and off via a phone app. Easy to use and linked to \
the internet and apps. Good value for money."""

review2 = """Tried and tried but could never get them to work right. Too bad \
I'm past my return date or they would have gone back."""

review3 = """A well-sized, reliable smart plug. The app is easy to use and set \
up, and works well. I used them to make several lamps. Everything works fine - \
no problems."""

review4 = """Great little product. Super easy to set up. Didn't even need to use \
the Alexa app to do so. Did it with my echo. Now I use it almost daily to turn on \
a light that was a pain to get to."""

review5 = """If I could give this zero stars I would. Plug wouldn’t connect. I \
had to keep connecting it and finally just gave up and returned it. Customer service \
was a complete waste of time."""

prompt = f"""
Review: {review1}
Sentiment: Positive

Review: {review2}
Sentiment: Negative

Review: {review3}
Sentiment: Positive

Review: {review5}
Sentiment:"""

In [11]:
# Tokenizamos el prompt
prompt_tokens = tokenizer_T5(prompt, return_tensors="pt").input_ids.to("cuda")

# Generamos los siguientes tokens
outputs = model_T5.generate(prompt_tokens, max_length=100)

# Transformamos los tokens generados en texto
print(tokenizer_T5.decode(outputs[0]))

<pad> <extra_id_0> easy to set up and use. <extra_id_1>. <extra_id_2> Positive Review: I'm past my return date or they would have gone back. <extra_id_3> : If I could give this zero stars I would. <extra_id_4> Positive Review: <extra_id_5> negative review: <extra_id_6> Positive Review: Would recommend. <extra_id_7> Positive Review: <extra_id_8> I <extra_id_9> Positive Review: Great product. Great value for money.</s>


## 2. Selección de un Fine-tuned LLM

En este caso práctico, vamos a hacer del modelo base Flan-T5 (google/flan-t5-base).

Estos modelos se basan en T5 preentrenados (Raffel et al., 2020) y se les ha realizado fine-tuning para mejorar el rendimiento en más de 1.000 tareas adicionales y para soportar varios idiomas: https://huggingface.co/google/flan-t5-base#training-details

### Lectura del modelo y tokenizador

In [12]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Importamos el tokenizador
tokenizer_FT5 = T5Tokenizer.from_pretrained("google/flan-t5-base")

# Importamos el modelo pre-entrenado
model_FT5 = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base", device_map="auto")

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/990M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

### Generación de texto

In [13]:
# Tokenizamos el prompt
prompt_tokens = tokenizer_FT5(prompt, return_tensors="pt").input_ids.to("cuda")

# Generamos los siguientes tokens
outputs = model_FT5.generate(prompt_tokens, max_length=50)

# Transformamos los tokens generados en texto
print(tokenizer_FT5.decode(outputs[0]))

<pad> Negative</s>


## 3. Selección de un Fine-tuned LLM de 1.000 millones de parámetros

En este último apartado vamos a hacer uso de Flan-T5-Large que tiene un total de 1.200 millones de parámetros: https://huggingface.co/google/flan-t5-large

### Lectura del modelo y del tokenizador

In [14]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

# Importamos el tokenizador
tokenizer_FT5 = T5Tokenizer.from_pretrained("google/flan-t5-large")

# Importamos el modelo pre-entrenado
model_FT5 = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large", device_map="auto")

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

### Generación de texto

In [15]:
# Tokenizamos el prompt
prompt_tokens = tokenizer_FT5(prompt, return_tensors="pt").input_ids.to("cuda")

# Generamos los siguientes tokens
outputs = model_FT5.generate(prompt_tokens, max_length=100)

# Transformamos los tokens generados en texto
print(tokenizer_FT5.decode(outputs[0]))

<pad> Negative</s>
