# LLM Evaluation and Tracing with W&B
## Using Tables for Evaluation
In this section, we'll use an LLM to generate names for game assets. We'll use W&B Tables to evaluate the generations.

Note, the DLAI tutorial uses OpenAI's API, which we will set up, but we should then try to use our local Ollama API.

In [1]:
import os
import random
import time
import datetime

import openai

from tenacity import (
    retry,
    stop_after_attempt,
    wait_random_exponential,
)
import wandb
from wandb.sdk.data_types.trace_tree import Trace

In [5]:
import sys

sys.path.append("../..")

from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv())

openai.api_key = os.environ['OPENAI_API_KEY']

In [2]:
PROJECT = "dlai_llm"
MODEL_NAME = "gpt-3.5-turbo"

In [3]:
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mthatgardnerone[0m. Use [1m`wandb login --relogin`[0m to force relogin


True

In [4]:
run = wandb.init(project=PROJECT, job_type="generation")

### Simple Generations
Let's start by generating names for our game assets using the LLM, saving the resulting generations in W&B Tables.

In [12]:
from openai import AsyncOpenAI

client = AsyncOpenAI()

@retry(wait=wait_random_exponential(min=1, max=60), stop=stop_after_attempt(6))
def completion_with_backoff(**kwargs):
    return client.chat.completions.create(**kwargs)

In [17]:
def generate_and_print(system_prompt, user_prompt, table, n=5):
    messages = [
        {'role': 'system', 'content': system_prompt},
        {'role': 'user', 'content': user_prompt}
    ]

    start_time = time.time()
    responses = completion_with_backoff(
        model=MODEL_NAME,
        messages=messages,
        n=n
    )
    elapsed_time = time.time() - start_time
    
    print(responses)

    for response in responses.choices:
        generation = response.message.content
        print(generation)

    table.add_data(
        system_prompt,
        user_prompt,
        [response.message.content for response in responses.choices],
        elapsed_time,
        datetime.datetime.fromtimestamp(responses.created),
        responses.model,
        responses.usage.prompt_tokens,
        responses.usage.completion_tokens,
        responses.usage.total_tokens,
    )

In [14]:
system_prompt = """You are a creative copywriter.
You're given a category of game asset, \
and your goal is to design a name of that asset.
Tje game is set in a fantasy world, where everyone \
laughs and respects each other, whilst celebrating \
diversity and inclusion."""

In [15]:
columns = [
    "system_prompt",
    "user_prompt",
    "generations",
    "elapsed_time",
    "timestamp",
    "model",
    "prompt_tokens",
    "completion_tokens",
    "total_tokens"
]
table = wandb.Table(columns=columns)

In [18]:
user_prompt = 'hero'
generate_and_print(system_prompt, user_prompt, table)

<coroutine object AsyncCompletions.create at 0x128b698c0>


  def only(it):


AttributeError: 'coroutine' object has no attribute 'choices'