## QS Relevance Feedback Requirements
1. Relevance requires adherence to the entire query.
2. Context that provides no answer can still be relevant.
3. Feedback mechanism should differentiate between seeming and actual relevance.
4. Relevant but inconclusive statements should get increasingly high scores as they are more helpful for answering the query.

In [1]:
import os
os.environ["OPENAI_API_KEY"] = "..."

In [2]:
# Imports main tools:
from trulens_eval.feedback import OpenAI
openai = OpenAI()
relevance = openai.qs_relevance

## Relevance requires adherence to the entire query.

In [3]:
def test_low_adherence_short():
    score = relevance("How many stomachs does a cow have?","Cows' diet relies primarily on grazing.")
    assert score >= 0.2, f"Score of {score} < 0.2. Statement is relevant to at least some of query."

In [4]:
def test_low_adherence_medium():
    score = relevance("""Name some famous dental floss brands","Oral-B is an American brand of oral hygiene products, 
        including toothpastes, toothbrushes, electric toothbrushes, and mouthwashes. 
        The brand has been in business since the invention of the Hutson toothbrush in 1950 and in Redwood City, California.""")
    assert score >= 0.2, f"Score of {score} < 0.2. Statement is relevant to at least some of query."

In [None]:
def test_low_adherence_long():
    score = relevance("Name some famous dental floss brands","""BlockingIOErrorTypes of floss and alternative options. Dental floss is regarded as the gold standard — it’s been around the longest compared to other plaque-removing products, Hewlett said. 
    Moursi also added that most flossing research studies have been conducted with dental floss, so there’s a lot of data showing its effectiveness. But floss is not one-size-fits-all, 
    he noted. Since using dental floss is difficult for some, there are other effective tools like interdental cleaners. Below, we broke down 
    the differences among several different options. Dental floss When people think of dental floss, it’s usually the threaded variety that comes on a spool. 
    But there’s also dental tape, which Hewlett described as a wider and flatter type of floss. He said it's particularly useful for 
    people with larger spaces between their teeth since it covers more surface area. Both forms of floss come in unflavored or flavored varieties, 
    but choosing a flavored option has no impact on how well it cleans your teeth, Hewlett said. Flosses also come waxed and unwaxed — 
    while a wax coating can make floss pass between teeth more easily, Hewitt said, both waxed and unwaxed are equally effective when used properly. 
    Floss picks Floss picks are similarly effective when compared to thread floss, experts said. The picks look like a wand and have a small piece 
    of floss at the forked end, so you can grip the handle while using the tool. Experts said floss picks are generally easy to use, especially if
    you’re flossing a child’s teeth. Water flossers Water flossers are powered devices that shoot pressurized water at the spaces between teeth, 
    targeting debris to disrupt and flush out plaque. While there is evidence to support their ability to remove plaque from teeth, Moursi
    said for water flossers to do their job, “you have to hold it in just the right place, at just the right angle and for just the right
    amount of time,” which can be challenging. Anyone can use water flossers, but experts said they’re the most beneficial for people who
    have difficulty using thread floss or floss threaders, as well as those with certain dental work like braces, bridges and crowns. 
    Interdental brushes Dental work like braces, bridges and crowns can block floss from slipping between teeth, making flossing challenging.
    Interdental brushes — which look like little spoolie brushes — can pass through the spaces between teeth and under any dental work,
    allowing you to remove plaque. The brushes have bristles on one end and a handle to grip on the other. To use, you point the brush at
    the gum line between teeth and push it through, moving the bristles around the space to remove plaque, said Hewlett. 
    The brushes come in various shapes and sizes to fit the spaces between your teeth.""")
    assert score >= 0.2, f"Score of {score} < 0.2. Statement is relevant to at least some of query."

In [5]:
def test_majority_adherence_short():
    score = relevance("Name some famous dental floss brands?","Some key companies operating in the dental floss market include Colgate and Water Pik.")
    assert score >= 0.5, f"Score of {score} < 0.5. Statement is relevant to most of query."
    assert score <= 0.8, f"Score of {score} > 0.8. Statement is not relevant to all of query."

In [6]:
def test_majority_adherence_medium():
    score = relevance("How does the social structure of a lion pride impact the genetic diversity and long-term survival of the species?","""A typical pride of lions consists of about six related females, their dependent offspring, and a “coalition” 
    of 2–3 resident males that joined the pride from elsewhere. The pride is a “fission-fusion” society and
    pridemates are seldom found together, except for mothers that have pooled their offspring into a “crèche.”""")
    assert score >= 0.5, f"Score of {score} < 0.5. Statement is relevant to most of query."
    assert score <= 0.8, f"Score of {score} > 0.8. Statement is not relevant to all of query."

In [None]:
def test_majority_adherence_long():
    #TODO
    pass

In [None]:
def test_complete_adherence_short():
    #TODO
    pass

In [None]:
def test_complete_adherence_medium():
    #TODO
    pass

In [None]:
def test_complete_adherence_long():
    #TODO
    pass

## Context that provides no answer can still be relevant.

In [7]:
def test_nonanswer_short():
    score = relevance("How many countries are there in the world?", "There is no universally accepted answer to how many countries there are in the world.")
    assert score >= 0.5, f"Score of {score} < 0.5. Relevant context without definitive answer did not get a score of >= 0.5"

In [8]:
def test_nonanswer_med():
    score = relevance("""What is the meaning of life?", "No one can tell the actual definition of the meaning of life.
    For some, it is all about happiness, building a family, and leading life as it is. For some, it is about accumulating wealth, whereas,
    for some, it is all about love.""")
    assert score >= 0.5, f"Score of {score} < 0.5. Relevant context without definitive answer did not get a score of >= 0.5"

In [9]:
def test_nonanswer_long():
    score = relevance("What came first, the chicken or the egg?","""Eggs come from chickens and chickens come from eggs: that’s the basis of this ancient riddle. But eggs – which are just female sex cells – evolved more than a billion years ago, whereas chickens have been around for just 10,000 years. So the riddle is easily solved…or is it?

Taken at face value, there is no doubt that the egg came before the chicken. We tend to think of eggs as the shelled orbs laid by birds from which their chicks hatch – unless we eat them first. But all sexually reproducing species make eggs (the specialised female sex cells). That’s 99.99 per cent of all eukaryotic life – meaning organisms that have cells with a nucleus, so all animals and plants, and everything but the simplest life forms.

We don’t know for sure when sex evolved but it could have been as much as 2 billion years ago, and certainly more than 1 billion. Even the specialised sort of eggs laid by birds, with their tough outer membrane, evolved more than 300 million years ago.

As for chickens, they came into being much later. They are domesticated animals, so evolved as the result of humans purposefully selecting the least aggressive wild birds and letting them breed. This seems to have happened in several places independently, starting around 10,000 years ago.

The wild ancestor of chickens is generally agreed to be a tropical bird still living in the forests of Southeast Asia called the red junglefowl –  with other junglefowl species possibly adding to the genetic mix. From these origins, humans have carried chickens around the world over the past two millennia or more.

So, eggs dramatically predate chickens. But to be fair to the spirit of the riddle, we should also consider whether a chicken’s egg predates a chicken. As humans consistently chose the tamest red junglefowls and bred them together, the genetic makeup of the resulting birds will have shifted. At some stage during this domestication process the red junglefowl (Gallus gallus) evolved into a new subspecies, Gallus gallus domesticus, AKA the chicken.

In practice, it is impossible to pinpoint the moment when this happened. But in theory, at some point two junglefowl bred and their offspring was genetically different enough from the species of its parents to be classified as a chicken. This chicken would have developed within a junglefowl egg and only produced the very first chicken’s egg on reaching maturity. Looked at this way, the chicken came first.

"""
)
    assert score >= 0.5, f"Score of {score} < 0.5. Relevant context without definitive answer did not get a score of >= 0.5"

## Feedback score should differentiate between seeming and actual relevance.

In [10]:
def test_seeming_relevance_short():
    seemingly_relevant_score = relevance("Who won the superbowl in 2009?", "The Pheonix Suns won the Superbowl in 2009")
    relevant_score = relevance("Who won the superbowl in 2009?", "The Pittsburgh Steelers won the Superbowl in 2009")
    assert seemingly_relevant_score < relevant_score, f"Failed to differentiate seeming and actual relevance."

In [11]:
def test_seeming_relevance_medium():
    seemingly_relevant_score = relevance("What is a cephalopod?", """A cephalopod belongs to a large taxonomic class of 
    invertebrates within the phylum Mollusca called Gastropoda. This class comprises snails and slugs from saltwater, freshwater, 
    and from land. There are many thousands of species of sea snails and slugs, as well as freshwater snails, freshwater limpets, 
    and land snails and slugs.""")
    relevant_score = relevance("What is a cephalopod?", "A cephalopod is any member of the molluscan class Cephalopoda such as a squid, octopus, cuttlefish, or nautilus. These exclusively marine animals are characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot. Fishers sometimes call cephalopods 'inkfish referring to their common ability to squirt ink.")
    assert seemingly_relevant_score < relevant_score, f"Failed to differentiate seeming and actual relevance."

In [None]:
def test_seening_relevance_long():
    #TODO
    pass

## Relevant but inconclusive statements should get increasingly high scores as they are more helpful for answering the query.

In [12]:
def test_increasing_relevance_short():
    score_low = relevance("Who won the superbowl in 2009?","Santonio Holmes made a brilliant catch for the Steelers.")
    score_medium = relevance("Who won the superbowl in 2009?","Santonio Holmes made a brilliant catch for the Steelers in the superbowl.")
    score_high = relevance("Who won the superbowl in 2009?","Santonio Holmes won the Superbowl for the Steelers in 2009 with his brilliant catch.")
    assert (score_low < score_medium) & (score_medium < score_high), "Score did not increase with more relevant details."

In [13]:
def test_increasing_relevance_medium():
    score_low = relevance("What is a cephalopod?","Squids are a member of the molluscan class")
    score_medium = relevance("What is a cephalopod?","Squids are a member of the molluscan class characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot.")
    score_high = relevance("What is a cephalopod?","A cephalopod is any member of the molluscan class such as squid, octopus or cuttlefish. These exclusively marine animals are characterized by bilateral body symmetry, a prominent head, and a set of arms or tentacles (muscular hydrostats) modified from the primitive molluscan foot.")
    assert (score_low < score_medium) & (score_medium < score_high), "Score did not increase with more relevant details."

In [None]:
def test_increasing_relevance_long():
    #TODO
    pass

## Run smoke tests

In [None]:
def run_smoke_tests():
    smoke_tests = [test_addition, test_subtraction, test_multiplication, test_division]
    for i, test in enumerate(smoke_tests):
        try:
            test()
            print(f"Test {i+1}/{len(smoke_tests)} passed.")
        except AssertionError as e:
            print(f"Test {i+1}/{len(smoke_tests)} failed: {e}")

run_smoke_tests()