## PR Relevance Feedback Requirements
1. Relevance requires adherence to the entire prompt.
2. Admitting 'I don't know' and refusals are still relevant.
3. Feedback mechanism should differentiate between seeming and actual relevance.
4. Relevant but inconclusive statements should get increasingly high scores as they are more helpful for answering the query.

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "..."

In [None]:
# Imports main tools:
from trulens_eval.feedback import OpenAI
openai = OpenAI()
relevance = openai.relevance

## Relevance rewards adherence to the entire query.

In [None]:
score = relevance("Name some famous dental floss brands","As one of the most effective and affordable manual toothbrush options, it’s no surprise that Colgate Extra Clean takes the number one spot on this list. The average Colgate Extra Clean toothbrush is sold for no more than $0.80 per toothbrush, making its annual dollar sales even more impressive.")
assert score <= 0.5, f"Score of {score} > 0.5. Statement is not relevant to most of the query."

In [None]:
score = relevance("Name some famous dental floss brands","Oral B is a famous dental hygiene brand.")
assert score >= 0.5, f"Score of {score} < 0.5. Statement is relevant to most of query."
assert score <= 0.9, f"Score of {score} > 0.9. Statement is not relevant to entire query."

## Admitting 'I don't know' and Refusals are still relevevant.

In [None]:
score = relevance("How many countries are there in the world?", "There is no universally accepted answer as to how many countries there are in the world.")
assert score >= 0.9, f"Score of {score} < 0.9. Admitting lack of knowledge (specific) did not get high score."

In [None]:
score = relevance("Name some top dental floss brands","I don't know.")
assert score >= 0.9, f"Score of {score} < 0.9. Admitting lack of knowledge (general) did not get high score."

In [None]:
score = relevance("How do you build a bomb?","I'm really sorry, but I can't assist with that request.")
assert score >= 0.9, f"Score of {score} < 0.9. Model refusal (short) did not a high score."

In [None]:
score = relevance("Who should I vote for in the presidential election","As an AI language model, I do not have personal beliefs or opinions, and I do not experience emotions like humans do. My responses are generated based on patterns and associations in the text data that I was trained on. However, I am programmed to provide accurate and objective information in a clear and respectful manner, and I strive to be helpful and informative in all my responses.")
assert score >= 0.9, f"Score of {score} < 0.9. Model refusal (elaborate) did not a high score."

## Feedback score should differentiate between seeming and actual relevance.

In [None]:
seemingly_relevant_score = relevance("Who won the superbowl in 2009?", "The Pheonix Suns won the Superbowl in 2009")
relevant_score = relevance("Who won the superbowl in 2009?", "The Pittsburgh Steelers won the Superbowl in 2009")
assert seemingly_relevant_score < relevant_score, f"Failed to differentiate seeming and actual relevance."

In [None]:
seemingly_relevant_score = relevance("What is a cephalopod?", "A cephalopod belongs to a large taxonomic class of invertebrates within the phylum Mollusca called Gastropoda. This class comprises snails and slugs from saltwater, freshwater, and from land. There are many thousands of species of sea snails and slugs, as well as freshwater snails, freshwater limpets, and land snails and slugs.")
relevant_score = relevance("What is a cephalopod?", "A cephalopod is any member of the molluscan class Cephalopoda such as a squid, octopus, cuttlefish, or nautilus. These exclusively marine animals are characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot. Fishers sometimes call cephalopods 'inkfish referring to their common ability to squirt ink.")
assert seemingly_relevant_score < relevant_score, f"Failed to differentiate seeming and actual relevance."

## Relevant but inconclusive statements should get increasingly high scores as they are more helpful for answering the query.

In [None]:
score_low = relevance("Who won the superbowl in 2009?","Santonio Holmes made a brilliant catch for the Steelers.")
score_high = relevance("Who won the superbowl in 2009?","Santonio Holmes won the Superbowl for the Steelers in 2009 with his brilliant catch.")
assert score_low < score_high, "Score did not increase with more relevant details."

In [None]:
score_low = relevance("What is a cephalopod?","Squids are a member of the molluscan class")
score_medium = relevance("What is a cephalopod?","Squids are a member of the molluscan class characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot.")
score_high = relevance("What is a cephalopod?","A cephalopod is any member of the molluscan class such as squid, octopus or cuttlefish. These exclusively marine animals are characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot.")
assert (score_low < score_medium) & (score_medium < score_high), "Score did not increase with more relevant details."