<!-- docusaurus_head_meta::start
---
title: Scorers as Guardrails
---
docusaurus_head_meta::end -->

<!--- @wandbcode{prompt-optim-notebook} -->

# Scorers as Guardrails

Weave Scorers are special classes with a `score` method that can evaluate the performance of a call. They can range from quite simple rules to complex LLMs as judges. 

In this notebook, we will explore how to use Scorers as guardrails to prevent your LLM from generating harmful or inappropriate content.


In [None]:
%pip install weave --quiet

In [1]:
import weave

weave.init("scorers-as-guardrails")


# First, we define a very simple scorer that checks if the model output contains any specific words.
class WordMatcher(weave.Scorer):
    words: list[str]
    case_sensitive: bool = False

    @weave.op
    def score(self, output: str) -> float:
        print(output)
        for word in self.words:
            if self.case_sensitive:
                if word in output:
                    return True
            else:
                if word.lower() in output.lower():
                    return True
        return False


# Next we define a function that makes a prediction
@weave.op
def make_prediction(input: str) -> str:
    """Dummy function that makes a prediction"""
    if "test" in input:
        return "I'm sorry, I can't do that."
    else:
        return "Certainly!"


# Next we call the op using the `call` method in order to return
# a Call object.
prediction, weave_call = make_prediction.call("Please make a prediction")
print(f"The prediction for 'Please make a prediction' is: {prediction}")

# Next, let's construct a scorer that checks if the prediction contains the word "sorry"
# We will name the scorer "Apology Checker" which will show up as the name of the score
# associated with the call.
scorer = WordMatcher(name="Apology Checker", words=["sorry"])

# Now we can apply the scorer to the prediction
score_results = await weave_call.apply_scorer(scorer)
print(f"The results of the score are: {score_results}")

# In a real-world scenario, we would use the score results to determine if the prediction is safe
# and possibly modify the control flow of the program based on the score.
for example_input in [
    "Please make a prediction",
    "Please make a prediction with a test",
]:
    prediction, weave_call = make_prediction.call(example_input)
    score_results = await weave_call.apply_scorer(scorer)
    if score_results.result:
        print(f"The prediction for '{example_input}' ({prediction}) is NOT safe")
    else:
        print(f"The prediction for '{example_input}' ({prediction}) is safe")

Logged in as Weights & Biases user: timssweeney.
View Weave data at https://wandb.ai/timssweeney/scorers-as-guardrails/weave
🍩 https://wandb.ai/timssweeney/scorers-as-guardrails/r/call/019441d9-b322-7fe1-befc-03f10623326c
The prediction for 'Please make a prediction' is: Certainly!
Call(_op_name=<Future at 0x332d6bd30 state=running>, trace_id='019441d9-b321-7d62-bbad-faab27aca2b5', project_id='timssweeney/scorers-as-guardrails', parent_id=None, inputs={'input': 'Please make a prediction'}, id='019441d9-b322-7fe1-befc-03f10623326c', output='Certainly!', exception=None, summary={}, _display_name=None, attributes=AttributesDict({'weave': {'client_version': '0.51.28-dev0', 'source': 'python-sdk', 'os_name': 'Darwin', 'os_version': 'Darwin Kernel Version 23.6.0: Fri Nov 15 15:13:15 PST 2024; root:xnu-10063.141.1.702.7~1/RELEASE_ARM64_T6000', 'os_release': '23.6.0', 'sys_version': '3.10.8 (main, Dec  5 2022, 18:10:41) [Clang 14.0.0 (clang-1400.0.29.202)]'}}), started_at=None, ended_at=dateti