<a href="https://colab.research.google.com/github/tabba98/projects/blob/main/Tutorial_Code_Llama_2_7B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Generazione di codice con CodeLlama

In questo breve tutorial imparerai:
- come eseguire Llama Code in Colab gratuito
- come generare codice con Llama Code

*Questo notebook è basato sul post del blog ufficiale di Hugging Face - [Code Llama: Llama 2 impara a programmare](https://huggingface.co/blog/codellama).*

Altri link utili:
- [Codice Llama Docs](https://huggingface.co/docs/transformers/main/model_doc/code_llama)
- [Codice modello lama su HF](https://huggingface.co/codellama/CodeLlama-7b-hf)

*Nota: assicurarsi di eseguire questo notebook con la GPU abilitata. `Runtime` -> `Cambia tipo di runtime` -> `T4`*

Immergiamoci!

### Installazione di Hugging Face Dev

In [1]:
!pip install git+https://github.com/huggingface/transformers.git@main accelerate

Collecting git+https://github.com/huggingface/transformers.git@main
  Cloning https://github.com/huggingface/transformers.git (to revision main) to /tmp/pip-req-build-f79pwc_0
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-f79pwc_0
  Resolved https://github.com/huggingface/transformers.git to commit fa6107c97edf7cf725305a34735a57875b67d85e
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting accelerate
  Downloading accelerate-0.22.0-py3-none-any.whl (251 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m251.2/251.2 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers==4.34.0.dev0)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 k

### Caricamento del modello e del tokenizer

In [2]:
from transformers import AutoTokenizer
import transformers
import torch


model_id = "codellama/CodeLlama-7b-hf"
tokenizer = AutoTokenizer.from_pretrained(model_id)


Downloading (…)okenizer_config.json:   0%|          | 0.00/749 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

### Preparazione della pipeline

In [3]:
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

Downloading (…)lve/main/config.json:   0%|          | 0.00/637 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

### Funzione per generare Codice

In [4]:
def generate_code(prompt):
    sequences = pipeline(
        prompt,
        do_sample=True,
        temperature=0.1,
        top_p=0.9,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        max_length=128,
    )
    for seq in sequences:
        print(f"Result: {seq['generated_text']}")

### Testing su diversi prompt

In [5]:
generate_code("def factorial(")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Result: def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)


def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)


def main():
    print(factorial(5))
    print(fibonacci(5))


if


In [None]:
generate_code("def remove_non_ascii(s: str) -> str:")

In [6]:
generate_code("def remove_last_word(")

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Result: def remove_last_word(self, text):
        """
        Remove the last word from the text.

        :param text: The text to remove the last word from.
        :return: The text without the last word.
        """
        return text.rsplit(' ', 1)[0]

    def remove_first_word(self, text):
        """
        Remove the first word from the text.

        :param text: The text to remove the first word from.
        :return: The text without the first word.
        """



In [None]:
generate_code("def fibonacci(")

In [8]:
generate_code('def chess_board')

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Result: def chess_board_to_move(self, chess_board):
        """
        Convert a chess board to a move.

        :param chess_board: The chess board to convert.
        :return: The move.
        """
        return self.chess_board_to_move_list(chess_board)[0]

    def chess_board_to_move_list(self, chess_board):
        """
        Convert a chess board to a list of moves.

        :param chess_
