## Populate Weaviate instance

<a target="_blank" href="https://colab.research.google.com/github/weaviate-tutorials/intro-workshop/blob/main/2_build_new_db.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Add env vars (Colab only)

In [None]:
# import os

# os.environ["COHERE_APIKEY"] = "YOUR_COHERE_KEY"
# os.environ["OPENAI_APIKEY"] = "YOUR_OPENI_KEY"

## Prep

Install libraries as needed (uncomment for Colab)

In [None]:
# !pip install -Uqq weaviate-client weaviate-demo-datasets

In [None]:
def jprint(str_in: str):
    import json
    print(json.dumps(str_in, indent=2))

# Fun with Weaviate 😁🚀

Instantiate Weaviate client

In [None]:
import weaviate
from weaviate.embedded import EmbeddedOptions
import os

api_headers = {
    # You *ONLY* need the API key for the inference service that you are using
    # You will define further below (at Weaviate class-level) which API inference service to use for your data    
    "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"],  # Only Replace with your OpenAI key
    "X-Cohere-Api-Key": os.environ["COHERE_APIKEY"]  # Replace with your Cohere key
}

# # Option 1 - If using Colab:
# client = weaviate.Client(
#     embedded_options=EmbeddedOptions(version="latest"),
#     additional_headers=api_headers
# )

# Option 2 - If using WCS sandbox:
client = weaviate.Client(
    "https://workshop-demo-0xwl314q.weaviate.network",  # Replace this with your sandbox URL
    auth_client_secret=weaviate.AuthApiKey("KQBu0wOvoWd70rXJIf1hs1oFkSmnxiupA7rm"),  # Replace this with your API Key
    additional_headers=api_headers
)

# # Option 3 - If using Docker-Compose:
# client = weaviate.Client(
#     "http://localhost:8080",
#     additional_headers=api_headers
# )

Let's confirm that we are ready to go.

In [None]:
client.is_ready()

## What's available on my instance?

In [None]:
jprint(client.get_meta())

### Version

Note the Weaviate version

## Modules

What are these `modules`?

Modules allow you to configure Weaviate (so that you can work with your data in a way that suits you). Most commonly used modules are:
- Vectorizers (e.g. `text2vec-cohere`, `text2vec-huggingface`, `text2vec-openai`, `text2vec-palm`, etc.)
- Generative modules (e.g. `generative-xxx`)

### Bring your own vector

![img](https://github.com/weaviate-tutorials/intro-workshop/blob/main/images/object_import_process_simple.png?raw=1)

In this use case, you simply import your data and the associated vector. Weaviate simply indexes your data for you, so that you can search through it quickly, and effectively.

### Use a vectorizer module

![img](https://github.com/weaviate-tutorials/intro-workshop/blob/main/images/object_import_process_full.png?raw=1)

In this use case, you task Weaviate with the additional work of converting your data into vectors. The `vectorizer` module performs this job - whether it be through a local model, or using an external vectorizer API service.

## Populate Weaviate

The process for importing data into Weaviate is:

```
Add class definition to the schema
                ⬇
            Load data
                ⬇
        (Batch) import data
```

Here we'll use the unofficial library `weaviate-demo-datasets` for illustrative purposes.

In [None]:
import weaviate_datasets as wd
dataset = wd.WineReviews()

### Create class definition

In [None]:
class_objs = dataset.get_class_definitions()
class_objs[0].keys()

### Required information

Notice above that we have `class`, `vectorizer`, `moduleConfig`, `properties` data:

- `class`: The class name (like a SQL table name)
- `vectorizer`: Module to be used to generate vectors
- `moduleConfig`: Configure various modules to be used with the class
- `properties`: Define object properties (like a SQL column)

In [None]:
jprint(class_objs)

In [None]:
# If you want to change the vectorizer:

class_objs[0]["vectorizer"] = "text2vec-cohere"
class_objs

### Add class definition to the schema

In [None]:
for class_obj in class_objs:
    if not client.schema.contains(class_obj):
        print(f"Adding {class_obj['class']}")
        client.schema.create_class(class_obj)

In [None]:
client.schema.get()

In [None]:
class_name = class_obj['class']

### Load data

Load sample data from `weaviate_datasets`

In [None]:
dataset.get_sample()

In [None]:
# loader = dataset._class_dataloader(class_name)
# next(loader)[0]

### Batch import

Note: you should almost always use bach imports for speed.

In [None]:
from weaviate.util import generate_uuid5

loader = dataset._class_dataloader(class_name)
with client.batch() as batch:
    for data_objs in loader:
        data_obj = data_objs[0]
        batch.add_data_object(
            data_object=data_obj,
            class_name=class_name,
            # vector=[1,2, 3],  # You can specify your own vector here
            uuid=generate_uuid5(data_obj),  # This will generate a deterministic UUID based on the data object's content 
        )

Confirm import by getting an object count.

In [None]:
client.query.aggregate(class_name).with_meta_count().do()

Check that we have generated vectors.

In [None]:
res = client.query.get(
    class_name,
    ["title", "country", "review_body", "points"]
).with_additional("vector").with_limit(2).do()

In [None]:
jprint(res)

## Try queries

### Fetch objects

In [None]:
res = client.query.get(
    class_name,
    ["title", "country", "review_body", "points"]
).with_limit(2).do()

In [None]:
jprint(res)

### Similarity-based searches

#### Against a text input

In [None]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["a fruity white wine"]}
).with_limit(5).do()

In [None]:
for r in res["data"]["Get"][class_name]:
  print(r)

#### Move "away" from certain types of objects

In [None]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_limit(5).do()

In [None]:
for r in res["data"]["Get"][class_name]:
  jprint(r)

#### Against an input object

In [None]:
res = (
    client.query.get(class_name, ["title", "country", "review_body", "points"])
    .with_additional("id")
    .with_limit(2)
    .do()
)
res

In [None]:
res = (
    client.query.get(class_name, ["title", "country", "review_body", "points"])
    .with_near_object({"id": "01131ce1-d0be-5380-af48-e3cc5c455a63"})
    .with_additional("distance")
    .with_limit(5)
    .do()
)

res

#### Against an input vector

In [None]:
res = (
    client.query.get(class_name, ["title", "country", "review_body", "points"])
    .with_additional("vector")
    .with_limit(1)
    .do()
)
vector = res["data"]["Get"][class_name][0]["_additional"]["vector"]

In [None]:
res = (
    client.query.get(class_name, ["title", "country", "review_body", "points"])
    .with_near_vector({"vector": vector})
    .with_additional("distance")
    .with_limit(5)
    .do()
)

res

### Filter objects

In [None]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_where(
    {"path": ["price"],
     "operator": "GreaterThan",
     "valueNumber": 10}
).with_limit(5).do()

In [None]:
jprint(res)

#### Generative search

In [None]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_where(
    {"path": ["price"],
     "operator": "GreaterThan",
     "valueNumber": 10}
).with_generate(
    grouped_task="Are there any commonalities between these",
).with_limit(3).do()

In [None]:
jprint(res)

### For more examples: See our "how-to" search pages

- [How to: Search](https://weaviate.io/developers/weaviate/search)

### If you want to clean up the data

**This will delete all of your data in the *WineReview* class!**

In [None]:
# client.schema.delete_class("WineReview")