# Hands-on with Weaviate: Queries

<a target="_blank" href="https://colab.research.google.com/github/weaviate-tutorials/intro-workshop/blob/main/1a_hands_on_queries.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
# # For Colab
# !pip install -U weaviate-client

## Preparation

### Instantiate Weaviate client

In [None]:
import weaviate
import os

client = weaviate.Client(
    "https://edu-demo.weaviate.network",
    auth_client_secret=weaviate.AuthApiKey("learn-weaviate"),
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]
    }
)

### Inspect database

In [None]:
# Get schema
schema = client.schema.get()

In [None]:
# What does the schema look like?
schema

#### Note: Weaviate data structures:

- `class`: A collection of objects (like a SQL table)
- `properties`: Object properties (like a SQL column)

In [None]:
# What classes are in the instance?
c_names = [c["class"] for c in schema["classes"]]
c_names

In [None]:
# Inspect a particular class
class_schema = client.schema.get('WineReview')
class_schema

In [None]:
# What properties are in a particular class?
p_names = [p["name"] for p in class_schema["properties"]]
p_names

## Search - basic

### Fetch items from Weaviate

In [None]:
# Basic Get query
# Let's use the property names from above

response = (
    client.query
    .get("WineReview", p_names)
    .with_limit(3)
    .do()
)

response

**Quiz**: In what order do these objects come from?

### Specify the fetched properties

In [None]:
# Modify properties to be fetched

response = (
    client.query
    .get("WineReview", ["title"])
    .with_limit(3)
    .do()
)

response

In [None]:
# Raw GraphQL query

gql_query = """
{
    Get {
        WineReview (limit: 3) {
            title
        }
    }
}
"""

gql_response = (
    client.query.raw(gql_query)
)
gql_response

### Fetch additional properties

In [None]:
# Fetch object ID / vector
response = (
    client.query
    .get("WineReview", ["title"])
    .with_additional(["id", "vector"])
    .with_limit(3)
    .do()
)

response

#### Fetch cross-referenced properties

Cross-references are like relationships in SQL 

In [None]:
# Fetch JeopardyQuestion item
response = (
    client.query
    .get("JeopardyQuestion", ["question" ,"answer"])
    .with_limit(3)
    .do()
)

response

In [None]:
# Show `JeopardyQuestion` schema
client.schema.get("JeopardyQuestion")

In [None]:
# Show x-referenced class schema
client.schema.get("JeopardyCategory")

In [None]:
# Fetch JeopardyQuestion item
response = (
    client.query
    .get("JeopardyQuestion", ["question" ,"answer", "hasCategory {...on JeopardyCategory {title}}"])
    .with_limit(3)
    .do()
)

response

## Similarity-based searches

### NearText search

In [None]:
# NearText query - "very fancy wine"?
response = (
    client.query
    .get("WineReview", ["title", "review_body"])
    .with_limit(3)
    .with_near_text({
        "concepts": [
            "very fancy wine"
        ]
    })
    .do()
)

response

### NearObject

In [None]:
# NearObject query - first, grab an object ID
response = (
    client.query
    .get("WineReview", ["title", "review_body"])
    .with_limit(3)
    .with_near_text({
        "concepts": [
            "very fancy wine"
        ]
    })
    .with_additional("id")
    .do()
)

response

In [None]:
# NearObject query - use that ID to run a search
response = (
    client.query
    .get("WineReview", ["title", "review_body"])
    .with_limit(3)
    .with_near_object({
        "id": "f6da868f-9044-5b4d-87dd-21e1ffffbbf1"
    })
    .do()
)

response

### NearVector

In [None]:
# Grab a vector from OpenAI
import openai
openai.api_key = os.environ["OPENAI_APIKEY"]
resp = openai.Embedding.create(
  model="text-embedding-ada-002",
  input="Argentinian wine that goes well with fish"
)
resp

In [None]:
emb = resp["data"][0]["embedding"]
len(emb)

In [None]:
# NearVector query - use the vector to run a search

response = (
    client.query
    .get("WineReview", ["title", "review_body", "country"])
    .with_limit(3)
    .with_near_vector({
        "vector": emb
    })
    .do()
)

response

**Discussion**: What's going on under the hood?

![image](https://weaviate.io/assets/images/search-conceptual-dark-315f1e31d9008ce661031c31c1273bd2.png)

### Get distances to results

In [None]:
# Fetch distances in the results
response = (
    client.query
    .get("WineReview", ["title", "review_body", "country"])
    .with_limit(3)
    .with_near_vector({
        "vector": emb
    })
    .with_additional("distance")
    .do()
)

response

### Modify thresholds

In [None]:
# Add a distance threshold

response = (
    client.query
    .get("WineReview", ["title", "review_body", "country"])
    .with_near_vector({
        "vector": emb,
        "distance": 0.14
    })
    .with_additional("distance")
    .do()
)

response

**Discussion**: Why do we need thresholds / limits?

## Keyword (BM25) searches

In [None]:
# Try a keyword search for "apple"

response = (
    client.query.get("WineReview", ["title", "review_body", "country"])
    .with_bm25("apple")
    .with_additional("score")
    .with_limit(3)
    .do()
)

response

**Discussion**: How do keyword (BM25) searches work?

## Hybrid searches

In [None]:
# Try a hybrid search for "white wine easy drink"

response = (
    client.query.get("WineReview", ["title", "review_body", "country"])
    .with_hybrid("white wine easy drink")
    .with_additional("score")
    .with_limit(3)
    .do()
)

response

**Discussion**: How do hybrid searches work?

## Conditional (`where`) Filters

### Single filters

In [None]:
# Single filter with price

response = (
    client.query
    .get("WineReview", ["title", "review_body", "country"])
    .with_limit(3)
    .with_near_vector({
        "vector": emb
    })
    .with_where({
        "path": ["price"],
        "operator": "GreaterThan",
        "valueNumber": 10
    })
    .do()
)

response

#### Filter by partial matches

In [None]:
# Filter with partial string

response = (
    client.query
    .get("WineReview", ["title", "review_body", "country"])
    .with_limit(3)
    .with_near_vector({
        "vector": emb
    })
    .with_where({
        "path": ["review_body"],
        "operator": "Like",
        "valueText": "*citrus*"
    })
    .do()
)

response

#### Filter by cross-references

In [None]:
# Filter with cross-ref property (look for *history*)

response = (
    client.query
    .get("JeopardyQuestion", ["question", "answer", "hasCategory {...on JeopardyCategory {title}}"])
    .with_limit(5)
    .with_where({
        "path": ["hasCategory", "JeopardyCategory", "title"],
        "operator": "Like",
        "valueText": "*histor*"
    })
    .do()
)

response

### Nested filters

In [None]:
# Filter with nested filters
# `points` greater than 1000 or Like *history*
where_filter = {
    "operator": "Or",
    "operands": [
        {
            "path": ["hasCategory", "JeopardyCategory", "title"],
            "operator": "Like",
            "valueText": "*history*"
        },   
        {
            "path": ["points"],
            "operator": "GreaterThan",
            "valueInt": 1000
        },
    ]
}


response = (
    client.query
    .get("JeopardyQuestion", ["question", "answer", "points", "hasCategory {...on JeopardyCategory {title}}"])
    .with_limit(5)
    .with_where(where_filter)
    .do()
)

response

## Generative searches

In [None]:
# NearText with Generative
response = (
    client.query
    .get("WikiArticle", ["title"])
    .with_limit(1)
    .with_near_text({
        "concepts": [
            "australia"
        ]
    })
    .with_generate(
        single_prompt="Summarize this article {wiki_summary}"
    )
    .do()
)

response

In [None]:
# Get grouped data
response = (
    client.query
    .get("WineReview", ["title"])
    .with_limit(10)
    .with_near_text({
        "concepts": [
            "dessert wine"
        ]
    })
    .with_generate(
        grouped_task="Based on these reviews, what should you look for in a dessert wine? Provide three bullet points"
    )
    .do()
)

response