<img src="http://wandb.me/logo-im-png" width="400" alt="Weights & Biases" />
<!--- @wandbcode{intro-colab} -->

# 🏃‍♀️ Quickstart

Get started using Weave to:
- Log and debug language model inputs, outputs, and traces
- Build rigorous, apples-to-apples evaluations for language model use cases
- Organize all the information generated across the LLM workflow, from experimentation to evaluations to production

See the full Weave documentation [here](https://wandb.me/weave).


## 🪄 Install `weave` library and login


Start by installing the library and logging in to your account.

In this example, we're using openai so you should [add an openai API key](https://platform.openai.com/docs/quickstart/step-2-setup-your-api-key).


In [5]:
%%capture
!pip install weave openai set-env-colab-kaggle-dotenv

In [6]:
# Log in to your W&B account
import weave

In [None]:
# Set your OpenAI API key

import os
import openai
from set_env import set_env
from google.colab import userdata

# Put your OPENAI_API_KEY in the secrets panel to the left 🗝️
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
set_env("OPENAI_API_KEY")
# os.environ["OPENAI_API_KEY"] = "sk-..." # alternatively, put your key here

## Track inputs & outputs of functions

- Import weave
- Call `weave.init('project-name')` to start logging

Here, we're automatically tracking all calls to `openai`. We automatically track a lot of LLM libraries, but it's really easy to add support for whatever LLM you're using, as you'll see below. 

In [3]:
import weave
from openai import OpenAI

PROJECT = "weave-intro-notebook"

weave.init(PROJECT)

client = OpenAI()
response = client.chat.completions.create(
  model="gpt-3.5-turbo-1106",
  messages=[
      {
          "role": "system",
          "content": "You are a grammar checker, correct the following user input."
      },
      {
          "role": "user",
          "content": "That was so easy, it was a piece of pie!"
      }
      ],
      temperature=0,
)
generation = response.choices[0].message.content
print(generation)

Logged in as Weights & Biases user: _scott.
View Weave data at https://wandb.ai/_scott/weave-intro-notebook/weave
That was so easy, it was a piece of cake!


You can find your interactive dashboard by clicking any of the  👆 wandb links above.

## Track custom functions

Add the @weave.op decorator to the functions you want to track

In [7]:
import weave
from openai import OpenAI

weave.init(PROJECT)

@weave.op
def strip_user_input(user_input):
    return user_input.strip()

result = strip_user_input("    hello    ")
print(result)

🍩 https://wandb.ai/_scott/weave-intro-notebook/r/call/bcf89485-bf48-423d-b887-157c1f3a3c23
hello


After adding `weave.op` and calling the function, visit the link and see it tracked within your project.

💡 We automatically track your code, have a look at the code tab!

## Track nested functions

Now that you've seen the basics, let's combine all of the above and track some deeply nested functions alongside LLM calls.



In [9]:
import weave
from openai import OpenAI

weave.init(PROJECT)

@weave.op
def strip_user_input(user_input):
    return user_input.strip()

@weave.op
def correct_grammar(user_input):
    client = OpenAI()

    stripped = strip_user_input(user_input)
    response = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",
        messages=[
            {
                "role": "system",
                "content": "You are a grammar checker, correct the following user input."
            },
            {
                "role": "user",
                "content": stripped
            }
        ],
        temperature=0,
    )
    return response.choices[0].message.content

result = correct_grammar("   That was so easy, it was a piece of pie!    ")
print(generation)

🍩 https://wandb.ai/_scott/weave-intro-notebook/r/call/ee0bd52c-9549-4011-83e9-4876f8189087
That was so easy, it was a piece of cake!


## Track Errors

Whenever your code crashes, weave will highlight what caused the issue. This is especially useful for finding things like JSON parsing issues that can occasionally happen when parsing data from LLM responses.

In [10]:
import weave
from openai import OpenAI
import json

weave.init(PROJECT)

@weave.op()
def strip_user_input(user_input):
    return user_input.strip()

@weave.op()
def correct_grammar(user_input):
    client = OpenAI()

    stripped = strip_user_input(user_input)
    response = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=[
            {
                "role": "system",
                "content": "You are a grammar checker, correct the following user input."
            },
            {
                "role": "user",
                "content": stripped
            }
        ],
        temperature=0,
        response_format={ "type": "json_object" }
    )
    return json.loads(response.choices[0].message.content)

result = correct_grammar("   That was so easy, it was a piece of pie!    ")
print(generation)

🍩 https://wandb.ai/_scott/weave-intro-notebook/r/call/dace0d0f-1b06-4ff1-a669-3fba37ecb681


BadRequestError: Error code: 400 - {'error': {'message': "'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.", 'type': 'invalid_request_error', 'param': 'messages', 'code': None}}

## Tracking Objects

Organizing experimentation is difficult when there are many moving pieces. You can capture and organize the experimental details of your app like your system prompt or the model you're using within `weave.Objects`. This helps organize and compare different iterations of your app.

In [13]:
import weave
from openai import OpenAI

weave.init(PROJECT)

class GrammarCorrector(weave.Object):
    model: str
    system_message: str

    @weave.op
    def correct(self, user_input):
        client = OpenAI()
        response = client.chat.completions.create(
            model=self.model,
            messages=[
                {
                  "role": "system",
                  "content": self.system_message
                },
                {
                  "role": "user",
                  "content": user_input
                }
                ],
            temperature=0,
        )
        return response.choices[0].message.content


corrector = GrammarCorrector(model="gpt-3.5-turbo-1106", system_message = "You are a grammar checker, correct the following user input.")
result = corrector.correct("That was so easy, it was a piece of pie!")
print(result)

🍩 https://wandb.ai/_scott/weave-intro-notebook/r/call/890edc65-bf9d-4e6f-a213-dc1fa78a977c
That was so easy, it was a piece of cake!


Notice that we saved a versioned `GrammarCorrector` object that captures the configurations you're experimenting with.

## Explicitly publish & retrieve objects

You can publish objects and then retrieve them in your code. You can even call functions from your retrieved objects!

In [21]:
import weave

weave.init(PROJECT)

corrector = GrammarCorrector(model="gpt-3.5-turbo-1106", system_message = "You are a grammar checker, please correct the following user input.")
ref = weave.publish(corrector)
ref

📦 Published to https://wandb.ai/_scott/weave-intro-notebook/weave/objects/GrammarCorrector/versions/XMabXd0CY2encG2ifU2lBa8TbG6zOSfwvTxFLwgXYMU


ObjectRef(entity='_scott', project='weave-intro-notebook', name='GrammarCorrector', digest='XMabXd0CY2encG2ifU2lBa8TbG6zOSfwvTxFLwgXYMU', extra=[])

In [22]:
import weave

weave.init(PROJECT)

fetched_collector = weave.ref(f"weave:///{ref.entity}/weave-intro-notebook/object/{ref.name}:{ref.digest}").get()

# Notice: this object was loaded from remote location!
result = fetched_collector.correct("That was so easy, it was a piece of pie!")

print(result)

🍩 https://wandb.ai/_scott/weave-intro-notebook/r/call/909efbbe-9554-4e21-bf5d-d6509114c90d
That was so easy, it was a piece of cake!


# Evaluation

Evaluation-driven development helps you reliably iterate on an application. The `Evaluation` class is designed to assess the performance of a `Model` on a given `Dataset` or set of examples using scoring functions.

See a preview of the API below:

In [24]:
import weave
from weave import Evaluation
import asyncio

# Collect your examples
examples = [
    {"question": "What is the capital of France?", "expected": "Paris"},
    {"question": "Who wrote 'To Kill a Mockingbird'?", "expected": "Harper Lee"},
    {"question": "What is the square root of 64?", "expected": "8"},
]

# Define any custom scoring function
@weave.op
def match_score1(expected: str, model_output: dict) -> dict:
    # Here is where you'd define the logic to score the model output
    return {'match': expected == model_output['generated_text']}

@weave.op
def function_to_evaluate(question: str):
    # here's where you would add your LLM call and return the output
    return  {'generated_text': 'Paris'}

# Score your examples using scoring functions
evaluation = Evaluation(
    dataset=examples, scorers=[match_score1]
)

# Start tracking the evaluation
weave.init('intro-example')
# Run the evaluation
await evaluation.evaluate(function_to_evaluate)

🍩 https://wandb.ai/_scott/intro-example/r/call/30c97763-40d6-44df-add0-39b43929b6a4


{'match_score1': {'match': {'true_count': 1,
   'true_fraction': 0.3333333333333333}},
 'model_latency': {'mean': 0.005586067835489909}}

## What's next?

Follow the [Build an Evaluation pipeline](http://wandb.me/weave_eval_tut) tutorial to learn more about Evaluation and begin iteratively improving your applications.