<a href="https://colab.research.google.com/github/talatiqbal2/GenAIEngineering-Cohort3/blob/main/week1_session_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
pip install -U huggingface_hub



## Remote Inference via Inference Providers
Ensure you have a valid **HF_TOKEN** set in your environment, running this may bill your account above the free tier.
The following Python example shows how to run the model remotely on HF Inference Providers, using the **auto** provider setting (automatically selects an available inference provider).

The model you are trying to use is gated. Please make sure you have access to it by visiting the model page.To run inference, either set HF_TOKEN in your environment variables/ Secrets or run the following cell to login. 🤗

## Local Inference on GPU + multiple messages

Model page: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

⚠️ If the generated code snippets do not work, please open an issue on either the [model repo](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3)
			and/or on [huggingface.js](https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/model-libraries-snippets.ts) 🙏

In [4]:
from huggingface_hub import login
login(new_session=False)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [5]:
import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    provider="auto",
)

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        },
        {
            "role": "assistant",
            "content": " The capital of France is Paris. It is one of the most famous cities in the world, known for its rich history, art, culture, and landmarks such as the Eiffel Tower, Louvre Museum, and Notre-Dame Cathedral. Paris is also the political, economic, and cultural center of France."
        },
        {
            "role": "user",
            "content": "How confident are you?"
        },
        {
            "role": "assistant",
            "content": "  I am a model and my responses are based on the data I have been trained on. I strive to provide accurate and helpful information, but I don't have personal feelings or emotions. I don't have the ability to be confident or uncertain. I simply provide the information I have been programmed to know."
        },
    ],
)

print(completion.choices[0].message)
print('---')
print(completion.choices[0].message.content)

ValueError: You must provide an api_key to work with novita API or log in with `hf auth login`.

## Basic completion with max_tokens and temperature

In [None]:
completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
    max_tokens=1,  # This value is now deprecated in favor of max_completion_tokens, and is not compatible with o-series models.
    # max_completion_tokens=1,
    temperature=1 # Set the temperature for creativity (0.0 to 1.0)
)

completion.choices[0].message.content

### Inspecting logprobs


In [None]:
import math, pandas as pd

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ],
    logprobs=True,
    top_logprobs=5,
)

for idx, (tok, lp, top) in enumerate(zip(
        logprobs.tokens,
        logprobs.token_logprobs,
        logprobs.top_logprobs)):

    chosen = f"{tok} ({math.exp(lp):.3f})"
    alts = sorted(top.items(), key=lambda kv: kv[1], reverse=True)[:5]
    alts_fmt = [f"{t} ({math.exp(lp_):.3f})" for t, lp_ in alts]
    alts_fmt += [''] * (5 - len(alts_fmt))          # pad to 5

    records.append([idx, chosen, *alts_fmt])

df = pd.DataFrame(records,
                  columns=["Idx", "Chosen (p)", "Alt-1", "Alt-2",
                           "Alt-3", "Alt-4", "Alt-5"])

print(df.to_string(index=False))



In [None]:
# prompt: Write code that can render the response from LLM, colour code each token/word based on probability. Higher to lower colours: [ #FFFFFF, #dbdbdb, #adadac, #696868, #333331, #000000]   Prompt, role=user "Write 3 paragraphs about Paris"

from IPython.display import display, HTML

def render_colored_text(completion):
    """
    Renders the text response from an LLM, coloring each token based on its probability.

    Args:
        completion: The completion object from the LLM client with logprobs enabled.
    """
    logprobs = completion.choices[0].logprobs
    if not logprobs or not logprobs.token_logprobs:
        print("Log probabilities not available in the completion.")
        print(completion.choices[0].message.content)
        return

    # Define color mapping for probabilities (higher to lower)
    # Mapping log_prob to a 0-1 range based on min/max log_probs
    colors = ["#000000", "#333331", "#696868", "#adadac", "#dbdbdb", "#FFFFFF"] # reversed - white is most confident
    min_log_prob = min(logprobs.token_logprobs)
    max_log_prob = max(logprobs.token_logprobs)
    log_prob_range = max_log_prob - min_log_prob

    html_output = ""
    for token, log_prob in zip(logprobs.tokens, logprobs.token_logprobs):
        if log_prob_range > 0:
            normalized_prob = (log_prob - min_log_prob) / log_prob_range
        else:
            normalized_prob = 0.5

        color_index = int(normalized_prob * (len(colors) - 1))
        color = colors[color_index]

        # Encode HTML entities for special characters in tokens
        import html
        escaped_token = html.escape(token)

        html_output += f'<span style="color: {color};">{escaped_token}</span>'

    display(HTML(html_output))


print("Rendering response with color coding based on token probability:")
completion_paris = client.chat.completions.create(
    model="mistralai/Mistral-7B-Instruct-v0.3",
    messages=[
        {
            "role": "user",
            "content": "Write 3 paragraphs about Paris"
        }
    ],
    logprobs=True,
    top_logprobs=5,
)

render_colored_text(completion_paris)

