# Introduction

This notebook shows how to work with the Weaviate Streamlit connector. 

The Weaviate Streamlit connector is a Python package that allows you to easily create Streamlit apps that loads data from a Weaviate instance.

# Setup

Imports & environment variables

In [2]:
from st_weaviate_connection import WeaviateConnection
import streamlit as st
import os

weaviate_url = os.environ["WEAVIATE_URL"]
weaviate_apikey = os.environ["WEAVIATE_API_KEY"]
cohere_apikey = os.environ["COHERE_API_KEY"]  # Optional (for semantic search)

# Usage

## Requirements

#### Weaviate Cloud instance

The easiest way to use this connector is:
- With a [Weaviate Cloud](https://console.weaviate.cloud) instance
- Using the URL and the API key.

This demo uses the following read-only credentials:

```python
weaviate_url = "https://hha2nvjsruetknc5vxwrwa.c0.europe-west2.gcp.weaviate.cloud"
weaviate_apikey = "nMZuw1z1zVtnjkXXOMGx9Ows7YWGsakItdus"
```

**Note**: You can create a free Weaviate Cloud instance [here](https://console.weaviate.cloud).

#### (Optional) Inference API key

The demo notebook uses hybrid search, which combines semantic search with a traditional (keyword) search to provide more accurate results. 

The `Movie` collection is set up to use Cohere for semantic search. 

- If you do not have an Cohere API key, you can sign up on their website.
- If you do not wish to use the semantic search part of hybrid search, set the `alpha` value to `0` in the `query` function.

## Connect to a Weaviate Cloud instance

In [3]:
conn = st.connection(
    "weaviate",
    type=WeaviateConnection,
    url=weaviate_url,
    api_key=weaviate_apikey,
    additional_headers={"X-Cohere-Api-Key": cohere_apikey},
)

2024-07-31 17:14:00.219 
  command:

    streamlit run /Users/jphwang/Library/Caches/pypoetry/virtualenvs/st-weaviate-connection-6uLZqwja-py3.11/lib/python3.11/site-packages/ipykernel_launcher.py [ARGUMENTS]


## Querying Data

There are two convenience methods in the connector to query data:

- `.query()`: Perform a hybrid search, which is a weighted combination of a semantic search and a keyword search.
- `.graphql_query()`: Perform a raw GraphQL query

### Basic hybrid search

A hybrid search is a weighted combination of a semantic search and a keyword search.

For basic hybrid search, just provide the collection name and the search term.

In [4]:
df = conn.query(
    collection_name="MovieDemo",
    query="Fantasy or sci-fi drama",
)

df.head()

Unnamed: 0,movie_id,vote_count,budget,tagline,overview,revenue,title,release_year,vote_average,genres
0,83542,6786,102000000,Everything is Connected,A set of six nested stories spanning time betw...,130482868,Cloud Atlas,2012,6.882,"[Drama, Science Fiction]"
1,1124,15086,40000000,Are You Watching Closely?,A mysterious story of two magicians whose inte...,109676311,The Prestige,2006,8.203,"[Drama, Mystery, Science Fiction]"
2,419704,6271,87500000,The answers we seek are just outside our reach,"The near future, a time when both hope and har...",127461872,Ad Astra,2019,6.1,"[Science Fiction, Drama]"
3,9294,1118,32000000,Some things in life just can't be explained.,An ordinary man sees a bright light descend fr...,152000000,Phenomenon,1996,6.402,"[Drama, Romance, Science Fiction, Fantasy]"
4,58244,2431,60000000,Two worlds. One future.,In an alternate universe where twinned worlds ...,22187813,Upside Down,2012,6.277,"[Romance, Science Fiction, Drama, Fantasy]"


Additional parameters can be provided for more precise search results. Review the docstring for more information.

In [5]:
conn.query??

[0;31mSignature:[0m
[0mconn[0m[0;34m.[0m[0mquery[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mcollection_name[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mquery[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlimit[0m[0;34m:[0m [0mint[0m [0;34m=[0m [0;36m10[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mfilters[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mweaviate[0m[0;34m.[0m[0mcollections[0m[0;34m.[0m[0mclasses[0m[0;34m.[0m[0mfilters[0m[0;34m.[0m[0m_Filters[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtarget_vectors[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mList[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m,[0m [0mweaviate[0m[0;34m.[0m[0mcollections[0m[0;34m.[0m[0mclasses[0m[0;34m.[0m[0mgrpc[0m[0;34m.[0m[0m_MultiTargetVectorJoin[0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34

For example, we can narrow down the results by the release date. Here is the same search, but only for movies from before 2010:

In [6]:
from st_weaviate_connection import WeaviateFilter

df = conn.query(
    collection_name="MovieDemo",
    query="Fantasy or sci-fi drama",
    filters=WeaviateFilter.by_property("release_year").less_than(2010)
)

df.head()

Unnamed: 0,movie_id,vote_count,budget,tagline,title,revenue,overview,release_year,vote_average,genres
0,14337,2098,7000,What happens if it actually works?,Primer,545436,Two fledgling inventors discover a complex met...,2004,6.768,"[Science Fiction, Drama, Thriller]"
1,1124,15086,40000000,Are You Watching Closely?,The Prestige,109676311,A mysterious story of two magicians whose inte...,2006,8.203,"[Drama, Mystery, Science Fiction]"
2,34584,3786,27000000,A boy who needs a friend finds a world that ne...,The NeverEnding Story,20158808,While hiding from bullies in his school's atti...,1984,7.194,"[Adventure, Fantasy, Family, Drama]"
3,25376,2469,2000000,An unsolved crime. A love story. An unwritten ...,The Secret in Their Eyes,33965843,Hoping to put to rest years of unease concerni...,2009,8.002,"[Mystery, Thriller, Drama]"
4,1024,882,5000000,Not all angels are innocent.,Heavenly Creatures,3049135,Wealthy and precocious teenager Juliet transfe...,1994,7.0,"[Drama, Fantasy]"


If you prefer to use a raw GraphQL query, you can use the `.graphql_query` function.

In [7]:
gql = """
{
  Get {
    MovieDemo (
        limit: 10
      nearText: {
        concepts: ["historical period film"]
      }
    ) {
      title
      overview
      vote_average
      _additional {
        distance
      }
    }
  }
}
"""

df = conn.graphql_query(gql)
df

2024-07-31 17:14:01.162 No runtime found, using MemoryCacheStorageManager
2024-07-31 17:14:01.165 No runtime found, using MemoryCacheStorageManager


Unnamed: 0,_additional,overview,title,vote_average
0,{'distance': 0.44417703},"A sumptuous and sensual tale of intrigue, roma...",The Other Boleyn Girl,6.685
1,{'distance': 0.4576857},An epic that details the checkered rise and fa...,Napoleon,6.428
2,{'distance': 0.47027773},The timeless tale of King Arthur and the legen...,First Knight,6.079
3,{'distance': 0.4740405},"With Ran, legendary director Akira Kurosawa re...",Ran,8.078
4,{'distance': 0.47975093},"England, 15th century. Hal, a capricious princ...",The King,7.153
5,{'distance': 0.48198485},Akira Kurosawa's lauded feudal epic presents t...,Kagemusha,7.805
6,{'distance': 0.4889692},A fictional history of two legendary revolutio...,RRR,7.759
7,{'distance': 0.4907261},Katherine Watson is a recent UCLA graduate hir...,Mona Lisa Smile,6.935
8,{'distance': 0.49372083},The retelling of France’s iconic but ill-fated...,Marie Antoinette,6.695
9,{'distance': 0.4953565},"In 25 AD, Judah Ben-Hur, a Jew in ancient Jude...",Ben-Hur,7.889


## Advanced Usage

The Streamlit connector is a thin wrapper around the Weaviate Python client. 

This means that you can use the Weaviate Python client directly to perform more advanced operations.

We recommend using the client object in a context manager to ensure that no resources are leaked.

### Example: Perform retrieval augmented generation

In [None]:
with conn.client() as client:
    collection = client.collections.get("MovieDemo")
    response = collection.generate.hybrid(
        limit=20,
        query="Fantasy or sci-fi drama",
        grouped_task="From these, recommend one or two movies that would be family friendly!",
        grouped_properties=["title", "tagline"],
    )

    print("## Generated recommendation")
    print(response.generated)
    print("\n## Source data")
    for o in response.objects:
        print(f"Title: {o.properties['title']}")

See the [Weaviate Python client documentation](https://weaviate.io/developers/weaviate/client-libraries/python), and the [Weaviate documentation](https://weaviate.io/developers/weaviate/) for more information on the available operations.

## Use a local Weaviate instance

To connect to a local instance instead, specify the `url` parameter as `"localhost"` as shown below.

Then, you will be connected to the local Weaviate instance with default settings.

In [None]:
conn = st.connection(
    "weaviate",
    type=WeaviateConnection,
    url="localhost",
)

## Using Secrets

The following are valid values you can specify in your `secrets.toml` file when using this connection:

| Config | Description |
| --- | --- |
| WEAVIATE_URL | The url of the weavite instance you want to connect to |
| WEAVIATE_API_KEY | The corresponding api key of the weaviate instance you want to connect to (if applicable) |

For more details, refer to:

* [How to use secrets management](https://docs.streamlit.io/library/advanced-features/secrets-management#how-to-use-secrets-management)
* [st.connection](https://docs.streamlit.io/library/api-reference/connections/st.connection)

## Using the Weaviate client library directly

You can also use the Weaviate client library directly to perform more advanced operations.

Please see the [Weaviate Python client documentation](https://weaviate.io/developers/weaviate/client-libraries/python) for more information.