## QS Relevance Feedback Requirements
1. Relevance requires adherence to the entire query.
2. Context that provides no answer can still be relevant.
3. Feedback mechanism should differentiate between seeming and actual relevance.
4. Relevant but inconclusive statements should get increasingly high scores as they are more helpful for answering the query.

In [None]:
import os
os.environ["OPENAI_API_KEY"] = "..."

In [None]:
# Imports main tools:
from trulens_eval.feedback import OpenAI
openai = OpenAI()
relevance = openai.qs_relevance

## Relevance requires adherence to the entire query.

In [None]:
score = relevance("Name some famous dental floss brands","Oral-B is an American brand of oral hygiene products, including toothpastes, toothbrushes, electric toothbrushes, and mouthwashes. The brand has been in business since the invention of the Hutson toothbrush in 1950 and in Redwood City, California.")
assert score >= 0.2, f"Score of {score} < 0.2. Statement is relevant to at least some of query."

In [None]:
score = relevance("Name some famous dental floss brands?","Some key companies operating in the dental floss market include Procter & Gamble; Colgate-Palmolive Company; Johnson & Johnson Services, Inc.; Prestige Consumer Healthcare, Inc.; Dr. Fresh, LLC; Lion Corporation; Church & Dwight Co., Inc.; Shantou Oral Health Co. Ltd; Water Pik, Inc.; and The Humble Co.")
assert score >= 0.5, f"Score of {score} < 0.5. Statement is relevant to most of query."
assert score <= 0.8, f"Score of {score} > 0.8. Statement is not relevant to all of query."

In [None]:
score = relevance("How does the social structure of a lion pride impact the genetic diversity and long-term survival of the species?","A typical pride of lions consists of about six related females, their dependent offspring, and a “coalition” of 2–3 resident males that joined the pride from elsewhere. The pride is a “fission-fusion” society and pridemates are seldom found together, except for mothers that have pooled their offspring into a “crèche.”")
assert score >= 0.5, f"Score of {score} < 0.5. Statement is relevant to most of query."
assert score <= 0.8, f"Score of {score} > 0.8. Statement is not relevant to all of query."

## Context that provides no answer can still be relevant.

In [None]:
score = relevance("How many countries are there in the world?", "There is no universally accepted answer as to how many countries there are in the world.")
assert score >= 0.5, f"Score of {score} < 0.5. Relevant context without definitive answer did not get a score of >= 0.5"

In [None]:
score = relevance("What is the meaning of life?", "No one can tell the actual definition of the meaning of life. For some, it is all about happiness, building a family, and leading life as it is. For some, it is about accumulating wealth, whereas, for some, it is all about love.")
assert score >= 0.5, f"Score of {score} < 0.5. Relevant context without definitive answer did not get a score of >= 0.5"

In [None]:
score = relevance("What came first, the chicken or the egg?", "Some scientists say that eggs are much older than chickens, and that eggs were layed by dinosaurs. Others say that the formulation of egg shells relies on a protein found only in a chicken's ovaries.")
assert score >= 0.5, f"Score of {score} < 0.5. Relevant context without definitive answer did not get a score of >= 0.5"

## Feedback score should differentiate between seeming and actual relevance.

In [None]:
seemingly_relevant_score = relevance("Who won the superbowl in 2009?", "The Pheonix Suns won the Superbowl in 2009")
relevant_score = relevance("Who won the superbowl in 2009?", "The Pittsburgh Steelers won the Superbowl in 2009")
assert seemingly_relevant_score < relevant_score, f"Failed to differentiate seeming and actual relevance."

In [None]:
seemingly_relevant_score = relevance("What is a cephalopod?", "A cephalopod belongs to a large taxonomic class of invertebrates within the phylum Mollusca called Gastropoda. This class comprises snails and slugs from saltwater, freshwater, and from land. There are many thousands of species of sea snails and slugs, as well as freshwater snails, freshwater limpets, and land snails and slugs.")
relevant_score = relevance("What is a cephalopod?", "A cephalopod is any member of the molluscan class Cephalopoda such as a squid, octopus, cuttlefish, or nautilus. These exclusively marine animals are characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot. Fishers sometimes call cephalopods 'inkfish referring to their common ability to squirt ink.")
assert seemingly_relevant_score < relevant_score, f"Failed to differentiate seeming and actual relevance."

## Relevant but inconclusive statements should get increasingly high scores as they are more helpful for answering the query.

In [None]:
score_low = relevance("Who won the superbowl in 2009?","Santonio Holmes made a brilliant catch for the Steelers.")
score_medium = relevance("Who won the superbowl in 2009?","Santonio Holmes made a brilliant catch for the Steelers in the superbowl.")
score_high = relevance("Who won the superbowl in 2009?","Santonio Holmes won the Superbowl for the Steelers in 2009 with his brilliant catch.")
assert (score_low < score_medium) & (score_medium < score_high), "Score did not increase with more relevant details."

In [None]:
score_low = relevance("What is a cephalopod?","Squids are a member of the molluscan class")
score_medium = relevance("What is a cephalopod?","Squids are a member of the molluscan class characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot.")
score_high = relevance("What is a cephalopod?","A cephalopod is any member of the molluscan class such as squid, octopus or cuttlefish. These exclusively marine animals are characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot.")
assert (score_low < score_medium) & (score_medium < score_high), "Score did not increase with more relevant details."