# Introduction

This notebook shows how to work with the Weaviate Streamlit connector. 

The Weaviate Streamlit connector is a Python package that allows you to easily create Streamlit apps that loads data from a Weaviate instance.

# Imports

In [1]:
from st_weaviate_connection import WeaviateConnection
import streamlit as st
import os

# Usage

## Connect to a Weaviate Cloud instance

### Requirements

#### Weaviate Cloud instance

The easiest way to use this connector is with a [Weaviate Cloud](https://console.weaviate.cloud) instance, and to connect using the URL and the API key.

This demo uses the following read-only credentials:

```python
weaviate_url = "https://hha2nvjsruetknc5vxwrwa.c0.europe-west2.gcp.weaviate.cloud"
weaviate_apikey = "nMZuw1z1zVtnjkXXOMGx9Ows7YWGsakItdus"
```

You can create a free Weaviate Cloud instance by signing up [here](https://console.weaviate.cloud).

#### (Optional) Inference API key

The demo notebook uses hybrid search, which combines semantic search with a traditional (keyword) search to provide more accurate results. 

The `Movie` collection is set up to use OpenAI for semantic search. 

- If you do not have an OpenAI API key, you can sign up [here](https://platform.openai.com/signup).
- If you do not wish to use the semantic search part of hybrid search, set the `alpha` value to `0` in the `hybrid_search` function.

In [2]:
weaviate_url = os.environ["WEAVIATE_URL"]
weaviate_apikey = os.environ["WEAVIATE_API_KEY"]
openai_apikey = os.environ["OPENAI_API_KEY"]  # Optional (for semantic search)

conn = st.connection(
    "weaviate",
    type=WeaviateConnection,
    url=weaviate_url,
    api_key=weaviate_apikey,
    additional_headers={"X-OpenAI-Api-Key": openai_apikey},
)

2024-07-30 16:27:05.452 
  command:

    streamlit run /Users/jphwang/Library/Caches/pypoetry/virtualenvs/st-weaviate-connection-6uLZqwja-py3.11/lib/python3.11/site-packages/ipykernel_launcher.py [ARGUMENTS]


## Querying Data

There are three convenience functions in the connector to query data:

- `.hybrid_search`: Perform a hybrid search, which is a weighted combination of a semantic search and a keyword search.
- `.graphql_query`: Perform a raw GraphQL query

### Basic hybrid search

A hybrid search is a weighted combination of a semantic search and a keyword search.

For basic hybrid search, just provide the collection name and the search term.


In [7]:
df = conn.hybrid_search(
    collection_name="Movie",
    query="Fantasy or sci-fi drama",
)

df.head()

Unnamed: 0,title,overview,release_date,tmdb_id,vote_average,genre_ids
0,Onward,"In a suburban fantasy world, two teenage elf b...",2020-02-29 00:00:00+00:00,508439,7.7,"[10751, 16, 12, 35, 14]"
1,Raya and the Last Dragon,"Long ago, in the fantasy world of Kumandra, hu...",2021-03-03 00:00:00+00:00,527774,7.9,"[16, 10751, 14, 28, 12]"
2,Terminator 2: Judgment Day,Set ten years after the events of the original...,1991-07-03 00:00:00+00:00,280,8.1,"[28, 53, 878]"
3,Star Trek,The fate of the galaxy rests in the hands of b...,2009-05-06 00:00:00+00:00,13475,7.4,"[878, 28, 12]"
4,Everything Everywhere All at Once,An aging Chinese immigrant is swept up in an i...,2022-03-24 00:00:00+00:00,545611,7.8,"[28, 12, 878]"


Additional parameters can be provided for more precise search results. Review the docstring for more information.

In [12]:
conn.hybrid_search??

[0;31mSignature:[0m
[0mconn[0m[0;34m.[0m[0mhybrid_search[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mcollection_name[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mquery[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlimit[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m10[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfilters[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mweaviate[0m[0;34m.[0m[0mcollections[0m[0;34m.[0m[0mclasses[0m[0;34m.[0m[0mfilters[0m[0;34m.[0m[0m_Filters[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtarget_vectors[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mList[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m,[0m [0mweaviate[0m[0;34m.[0m[0mcollections[0m[0;34m.[0m[0mclasses[0m[0;34m.[0m[0mgrpc[0m[0;34m.[0m[0m_MultiTargetVectorJoin[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[

For example, we can narrow down the results by the release date. Here is the same search, but only for movies from before 2010:

In [15]:
from st_weaviate_connection import WeaviateFilter
from datetime import datetime

df = conn.hybrid_search(
    collection_name="Movie",
    query="Fantasy or sci-fi drama",
    filters=WeaviateFilter.by_property("release_date").less_than(datetime(2010, 1, 1))
)

df.head()

            To use a different timezone, specify it in the datetime object. For example:
            datetime.datetime(2021, 1, 1, 0, 0, 0, tzinfo=datetime.timezone(-datetime.timedelta(hours=2))).isoformat() = 2021-01-01T00:00:00-02:00
            


Unnamed: 0,title,overview,release_date,tmdb_id,vote_average,genre_ids
0,Terminator 2: Judgment Day,Set ten years after the events of the original...,1991-07-03 00:00:00+00:00,280,8.1,"[28, 53, 878]"
1,Star Trek,The fate of the galaxy rests in the hands of b...,2009-05-06 00:00:00+00:00,13475,7.4,"[878, 28, 12]"
2,Stargate,"An interstellar teleportation device, found in...",1994-10-28 00:00:00+00:00,2164,7.0,"[28, 12, 878]"
3,Avatar,"In the 22nd century, a paraplegic Marine is di...",2009-12-15 00:00:00+00:00,19995,7.6,"[28, 12, 14, 878]"
4,The Matrix,"Set in the 22nd century, The Matrix tells the ...",1999-03-31 00:00:00+00:00,603,8.2,"[28, 878]"


If you prefer to use a raw GraphQL query, you can use the `.graphql_query` function.

In [None]:
gql = """
{
  Get {
    Movie (
        limit: 10
      nearText: {
        concepts: ["historical period film"]
      }
    ) {
      title
      overview
      vote_average
      _additional {
        distance
      }
    }
  }
}
"""

df = conn.graphql_query(gql)
df

2024-07-30 11:16:29.685 No runtime found, using MemoryCacheStorageManager
2024-07-30 11:16:29.687 No runtime found, using MemoryCacheStorageManager


Unnamed: 0,_additional,overview,title,vote_average
0,{'distance': 0.1794489},"Set in a 19th-century European village, this s...",Corpse Bride,7.5
1,{'distance': 0.1873132},"At the height of the First World War, two youn...",1917,8.0
2,{'distance': 0.1886223},An epic tale of three brothers and their fathe...,Legends of the Fall,7.4
3,{'distance': 0.1949948},"Held captive for 7 years in an enclosed space,...",Room,8.0
4,{'distance': 0.19691008},"A vampire relates his epic life story of love,...",Interview with the Vampire,7.4
5,{'distance': 0.19889629},A young man struggles to access sublimated chi...,The Butterfly Effect,7.6
6,{'distance': 0.19933802},"In war-torn colonial America, in the midst of ...",The Last of the Mohicans,7.4
7,{'distance': 0.20157182},An epic love story centered around an older ma...,The Notebook,7.9
8,{'distance': 0.20218122},"An other-worldly story, set against the backdr...",The Shape of Water,7.2
9,{'distance': 0.20263833},"A burger-loving hit man, his philosophical par...",Pulp Fiction,8.5


## Advanced Usage

The Streamlit connector is a thin wrapper around the Weaviate Python client. 

This means that you can use the Weaviate Python client directly to perform more advanced operations.

We recommend using the client object in a context manager to ensure that no resources are leaked.

### Example: Perform retrieval augmented generation

In [None]:
with conn.client() as client:
    collection = client.collections.get("WineReview")
    response = collection.generate.hybrid(
        limit=4,
        query="a sweet european red wine",
        grouped_task="From these, recommend a wine that would pair well with a steak",
    )

    print("## Generated recommendation")
    print(response.generated)
    print("\n## Source data")
    for o in response.objects:
        print(f"Title: {o.properties['title']}")

## Generated recommendation
I would recommend a bold red wine, such as the Herdade das Servas 2015 Sem Barrica Unoaked Red from Portugal, to pair well with a steak. The rich tannins and concentrated black fruit flavors of this wine would complement the flavors of the steak nicely.

## Source data
Title: Messias 2015 Santola White (Vinho Verde)
Title: Gebeshuber 2013 Frizzante Rosé Pinot Noir (Österreichischer Perlwein)
Title: Pietradolce 2012 Archineri Rosso  (Etna)
Title: Herdade das Servas 2015 Sem Barrica Unoaked Red (Alentejano)


### Use a local Weaviate instance

To connect to a local instance instead, specify the `url` parameter as `"localhost"`, and the connector will connect with default settings.

In [None]:
conn = st.connection(
    "weaviate",
    type=WeaviateConnection,
    url="localhost",
)

## Using Secrets

The following are valid values you can specify in your `secrets.toml` file when using this connection:

| Config | Description |
| --- | --- |
| WEAVIATE_URL | The url of the weavite instance you want to connect to |
| WEAVIATE_API_KEY | The corresponding api key of the weaviate instance you want to connect to (if applicable) |

For more details, refer to:

* [How to use secrets management](https://docs.streamlit.io/library/advanced-features/secrets-management#how-to-use-secrets-management)
* [st.connection](https://docs.streamlit.io/library/api-reference/connections/st.connection)

## Using the Weaviate client library directly

You can also use the Weaviate client library directly to perform more advanced operations.

Please see the [Weaviate Python client documentation](https://weaviate.io/developers/weaviate/client-libraries/python) for more information.