# GPTCache and Weaviate ♻️

This notebook shows how to configure GPTCache to use Weaviate as the set vector store.

## Library Imports

In [1]:
from gptcache import cache
from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
from gptcache.embedding import OpenAI
import weaviate
import os
import sqlite3
from gptcache.adapter import openai
import timeit

## Configuration

### Use OpenAI for the embedding

In [2]:
openai_embedding_fn = OpenAI().to_embeddings

### Use SQLite to cache the requests/responses

In [None]:
cache_base = CacheBase("sqlite")

#### See what is currently in the SQLite database:

In [4]:
def dump_cache_content():
    cursor = sqlite3.connect("sqlite.db").cursor()
    tables = cursor.execute('SELECT * FROM sqlite_master WHERE type="table"').fetchall()

    for table in tables:
        # Print the table name as a delimiter
        print(f"Results for table {table[1]}:")
        print("------------------------")

        # Execute a SELECT * query for the table
        cursor.execute(f"SELECT * FROM {table[1]}")
        results = cursor.fetchall()

        # Print the results
        for row in results:
            print(row)

        # Print a blank line to separate the output for each table
        print()

In [None]:
# The database is currently empty

dump_cache_content()

### Connect to Weaviate

In [None]:
url = os.getenv("WEAVIATE_URL") # URL to your Weaviate instance
api_key = os.getenv("WEAVIATE_API_KEY") # authentication key -- ignore if you don't have this configured
auth_config = weaviate.AuthApiKey(api_key=api_key)
vector_base = VectorBase("weaviate", url=url, auth_client_secret=auth_config)

#### Create a Weaviate client to query the database outside of GPTCache

In [7]:
weaviate_client = weaviate.Client(url=url)

#### Create class and test connection

In [None]:
weaviate_class = "GPTCache"
weaviate_client.schema.get(class_name=weaviate_class)

### Initialize the cache

In [10]:
data_manager = get_data_manager(cache_base, vector_base)

cache.init(
    embedding_func=openai_embedding_fn,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(max_distance=1)
)

cache.set_openai_key()

Note:

In `similarity_evaluation`, we set `max_distance=1` to make the similarity threshold calculation "work" using this evaluation metric and cosine similarity (Weaviate's default similarity metric).

References:

1. [Calculating rank threshold](https://github.com/zilliztech/GPTCache/blob/03a059704443961ae5b6ca243e3edc2dc15aeb2a/gptcache/adapter/adapter.py#L98C1-L107C10)

2. [Applying the rank threshold](https://github.com/zilliztech/GPTCache/blob/03a059704443961ae5b6ca243e3edc2dc15aeb2a/gptcache/adapter/adapter.py#L158C1-L176C18)

### Calculate the time it takes to query the LLM

In [11]:
def timeit_decorator(func):
    def wrapper(*args, **kwargs):
        # Time the execution of the function
        start_time = timeit.default_timer()
        result = func(*args, **kwargs)
        end_time = timeit.default_timer()

        # Print the time taken
        print(f"Time taken to run {func.__name__}: {end_time - start_time:.2f} seconds")

        return result

    return wrapper

In [12]:
@timeit_decorator
def get_openai_response(question):
    # Call the OpenAI API to get a response
    result = openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": question}],
    )
    # Extract the response from the API result
    response = result["choices"][0]["message"]["content"]

    # Return the response
    return response

## Query Time

#### Let's first ask "Who is Barrack Obama"

In [13]:
question = "Who is Barrack Obama?"

get_openai_response(question)

Time taken to run get_openai_response: 4.14 seconds


'Barack Obama is an American politician who served as the 44th President of the United States from 2009 to 2017. He was the first African American to be elected to the presidency. Prior to becoming president, Obama was a lawyer, community organizer, and served as a U.S. Senator from Illinois. During his presidency, he implemented various policies such as the Affordable Care Act, the economic stimulus package, and the Paris Agreement on climate change. Obama is known for his charisma, eloquence, and progressive approach to governance.'

#### Let's ask the same question again and note how long the response takes

In [14]:
get_openai_response(question)

Time taken to run get_openai_response: 1.06 seconds


'Barack Obama is an American politician who served as the 44th President of the United States from 2009 to 2017. He was the first African American to be elected to the presidency. Prior to becoming president, Obama was a lawyer, community organizer, and served as a U.S. Senator from Illinois. During his presidency, he implemented various policies such as the Affordable Care Act, the economic stimulus package, and the Paris Agreement on climate change. Obama is known for his charisma, eloquence, and progressive approach to governance.'

#### Let's rephrase the same question

In [15]:
# This question is very similar to the above question and the response time is still very quick

rephrase_question = "Tell me more about Barrak Obama"
get_openai_response(rephrase_question)

Time taken to run get_openai_response: 0.29 seconds


'Barack Obama is an American politician who served as the 44th President of the United States from 2009 to 2017. He was the first African American to be elected to the presidency. Prior to becoming president, Obama was a lawyer, community organizer, and served as a U.S. Senator from Illinois. During his presidency, he implemented various policies such as the Affordable Care Act, the economic stimulus package, and the Paris Agreement on climate change. Obama is known for his charisma, eloquence, and progressive approach to governance.'

#### Now let's look at the content stored in the SQLite database

In [16]:
# The questions and answers are stored in the database

dump_cache_content()

Results for table gptcache_question:
------------------------
(1, 'Who is Barrack Obama?', '2023-08-02 19:17:43.712593', '2023-08-02 19:18:01.232519', b'q$=\xbc\xa6k\x04\xbdtKY\xbc\xda\xf4\x93\xbc\xa4\xc3d\xbc1\xc8\xb6\xbb\xf4|\xb4\xbb\x93\x8c\x1d;\xa9/\xea\xbc\xf8a\x88\xbb\t\xaf\xe7<\xbd\xe2 \xbb\x0fJ>;\xad\x9b\xef:Y<\x87<\xac\xa3\xa4\xbb\xe1\x00\x04=\xec\x8e\x91\xbc\xaa\xed\xa1<\xbc\x87\x9f\xbc\x11\x16Y\xbcn\xfd\xa0\xbc\xca\x10\x19<\x8d\xc5\x16<\x06\xeb\x81\xbbc\x0c];"\xc6\xee<n\xfd\xa0\xbcVA\x1b<\xc1d>\xbdF]\xa0<54\xbc\xbci \x82\xbc\xf3\xbe|\xbcQ\xeb\xad\xbcvz*\xbc\x96:\xeb\xbb\xde`\x19\xbce%\x96<\x16V.9\xe6z\xec<\xec\xff\xaa;\xa7\xc6\x85\xba\x94\x84\xe8\xbb\xc0\xf3\xa4\xbc\xe2\xe26\xbc\xac*V\xba\xa6k\x04\xbd\xb3L\xde\xbc\xed\xe9\x92<\x1f\xa7\x07=<V\xc4<y\x8b\xae\xbb\xb8)}\xbc@\xac1\xbc5\xad\x8a\xbb\n\xfc\x05\xbdon:\xb9\x9a\xa6p\xbb\xab\xcf\xd4\xba\xda{E<GG\x08<\xf3\x0b\x1b\xbc\xba\xdf\xff;\r\x94\xbb;*-`\xbc="_\xbc\xa5\x974="?=\xbc\t\xafg<8\xd4&=\x82c\xb9<\xb3L^\xba\xf4\xf5\x02\xbc3

In [17]:
weaviate_client.data_object.get(class_name=weaviate_class)

{'deprecations': [],
 'objects': [{'class': 'GPTCache',
   'creationTimeUnix': 1691000798038,
   'id': '336f2205-bf48-4b42-af4f-63fc76f96321',
   'lastUpdateTimeUnix': 1691000798038,
   'properties': {'data_id': 1},
   'vectorWeights': None},
  {'class': 'GPTCache',
   'creationTimeUnix': 1691003217820,
   'id': 'd1b2a042-489d-409e-9a95-5b2053bc2238',
   'lastUpdateTimeUnix': 1691003217820,
   'properties': {'data_id': 2},
   'vectorWeights': None},
  {'class': 'GPTCache',
   'creationTimeUnix': 1691003863859,
   'id': 'f5d32f6e-c2c9-445e-9ae2-2e4b0b27a127',
   'lastUpdateTimeUnix': 1691003863859,
   'properties': {'data_id': 1},
   'vectorWeights': None}],
 'totalResults': 3}

### Examples that didn't perform very well

##### Starting with non-fictional characters

In [18]:
new_question = "Who is Joe Biden?"

get_openai_response(new_question)

Time taken to run get_openai_response: 0.48 seconds


'Barack Obama is an American politician who served as the 44th President of the United States from 2009 to 2017. He was the first African American to be elected to the presidency. Prior to becoming president, Obama was a lawyer, community organizer, and served as a U.S. Senator from Illinois. During his presidency, he implemented various policies such as the Affordable Care Act, the economic stimulus package, and the Paris Agreement on climate change. Obama is known for his charisma, eloquence, and progressive approach to governance.'

In [19]:
another_new_question = "Who is Taylor Swift?"

get_openai_response(another_new_question)

Time taken to run get_openai_response: 0.37 seconds


'Barack Obama is an American politician who served as the 44th President of the United States from 2009 to 2017. He was the first African American to be elected to the presidency. Prior to becoming president, Obama was a lawyer, community organizer, and served as a U.S. Senator from Illinois. During his presidency, he implemented various policies such as the Affordable Care Act, the economic stimulus package, and the Paris Agreement on climate change. Obama is known for his charisma, eloquence, and progressive approach to governance.'

##### Trying with fictional characters

In [20]:
get_openai_response("Who is Antman?")

Time taken to run get_openai_response: 5.53 seconds


"Ant-Man is a superhero character from Marvel Comics. The original Ant-Man is Dr. Hank Pym, a brilliant scientist who invents a suit that allows him to shrink in size while increasing his strength. He can communicate with and control ants using a special helmet. Over the years, various other characters have taken on the mantle of Ant-Man, including Scott Lang and Eric O'Grady. Ant-Man is also part of the Marvel Cinematic Universe, where he is portrayed by actor Paul Rudd."

In [21]:
get_openai_response("What is the Paul Rudd superhero movie")

Time taken to run get_openai_response: 0.78 seconds


"Ant-Man is a superhero character from Marvel Comics. The original Ant-Man is Dr. Hank Pym, a brilliant scientist who invents a suit that allows him to shrink in size while increasing his strength. He can communicate with and control ants using a special helmet. Over the years, various other characters have taken on the mantle of Ant-Man, including Scott Lang and Eric O'Grady. Ant-Man is also part of the Marvel Cinematic Universe, where he is portrayed by actor Paul Rudd."

## Questions

1. Why are Barrak Obama, Joe Biden, and Taylor Swift semantically similar?


2. How can we tweak semantic caching so that "Who is Barrack Obama" and "Who is Joe Biden" are semantically distinct? Do we need a more sophisticated distance evaluation metric?