## Populate Weaviate instance

<a target="_blank" href="https://colab.research.google.com/github/weaviate-tutorials/intro-workshop/blob/main/2_build_new_db.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Install libraries as needed (uncomment for Colab)

In [None]:
# !pip install -Uqq weaviate-client weaviate-demo-datasets

In [None]:
def jprint(str_in: str):
    import json
    print(json.dumps(str_in, indent=2))

Instantiate Weaviate client

In [None]:
import weaviate
from weaviate.embedded import EmbeddedOptions
import os

# # Option 1 - If using Colab:
# client = weaviate.Client(
#     embedded_options=EmbeddedOptions(version="latest"),
#     additional_headers={
#         "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your OpenAI key
#     }
# )

# Option 2 - If using WCS sandbox:
client = weaviate.Client(
    "https://workshop-demo-0xwl314q.weaviate.network",  # Replace this with your sandbox URL
    auth_client_secret=weaviate.AuthApiKey("KQBu0wOvoWd70rXJIf1hs1oFkSmnxiupA7rm"),  # Replace this with your API Key
    additional_headers={
      "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your OpenAI key
    }
)

# # Option 3 - If using Docker-Compose:
# client = weaviate.Client(
#     "http://localhost:8080",
#     additional_headers={
#         "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"]  # Replace with your OpenAI key
#     }
# )

Let's confirm that we are ready to go.

In [None]:
client.is_ready()

## What's available on my instance?

In [None]:
jprint(client.get_meta())  

### Version

Note the Weaviate version

## Modules 

What are these `modules`?

Modules allow you to configure Weaviate (so that you can work with your data in a way that suits you). Most commonly used modules are:
- Vectorizers (e.g. `text2vec-cohere`, `text2vec-huggingface`, `text2vec-openai`, `text2vec-palm`, etc.)
- Generative modules (e.g. `generative-xxx`)

### Bring your own vector

![img](./images/object_import_process_simple.png)

In this use case, you simply import your data and the associated vector. Weaviate simply indexes your data for you, so that you can search through it quickly, and effectively.

### Use a vectorizer module

![img](./images/object_import_process_full.png)

In this use case, you task Weaviate with the additional work of converting your data into vectors. The `vectorizer` module performs this job - whether it be through a local model, or using an external vectorizer API service.

## Populate Weaviate

Let's add some data to Weaviate. The process for importing data into Weaviate is as follows:

```
     Create class definition
                ⬇
Add class definition to the schema
                ⬇
            Load data
                ⬇
        (Batch) import data
```

Here we'll use the unofficial library `weaviate-demo-datasets` for illustrative purposes.

In [None]:
import weaviate_datasets as wd

In [None]:
dataset = wd.WineReviews()

### Create class definition

In [None]:
class_objs = dataset.get_class_definitions()

In [None]:
jprint(class_objs)

### Add class definition to the schema

In [None]:
for class_obj in class_objs:
    if not client.schema.contains(class_obj):
        print(f"Adding {class_obj['class']}")
        client.schema.create_class(class_obj)

In [None]:
client.schema.get()

In [None]:
class_name = class_obj['class']

### Load data

Load sample data from `weaviate_datasets`

In [None]:
dataset.get_sample()

In [None]:
# loader = dataset._class_dataloader(class_name)
# next(loader)[0]

### Batch import

Note: you should almost always use bach imports for speed.

In [None]:
from weaviate.util import generate_uuid5

loader = dataset._class_dataloader(class_name)
with client.batch() as batch:
    for data_objs in loader:
        data_obj = data_objs[0]     
        batch.add_data_object(
            data_object=data_obj, 
            class_name=class_name,
            vector=[1,2, 3],  # You can specify your own vector here
            uuid=generate_uuid5(data_obj),
        )

Confirm import by getting an object count.

In [None]:
client.query.aggregate(class_name).with_meta_count().do()

Check that we have generated vectors.

In [None]:
res = client.query.get(
    class_name, 
    ["title", "country", "review_body", "points"]
).with_additional("vector").with_limit(2).do()

In [None]:
jprint(res)

### (But...) Here's one I prepared earlier

Let's import the whole dataset with a predefined function.

In [None]:
client.schema.delete_class(class_name)

In [None]:
dataset.upload_dataset(client)  # Includes pre-vectorized data

### Try queries

In [None]:
res = client.query.get(
    class_name, 
    ["title", "country", "review_body", "points"]
).with_limit(2).do()

In [None]:
jprint(res)

In [None]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["a fruity white wine"]}
).with_limit(5).do()

In [None]:
for r in res["data"]["Get"][class_name]:
  print(r)

In [None]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_limit(5).do()

In [None]:
for r in res["data"]["Get"][class_name]:
  jprint(r)

In [None]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_where(
    {"path": ["price"], 
     "operator": "GreaterThan",
     "valueNumber": 10}
).with_limit(5).do()

In [None]:
jprint(res)

In [None]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_where(
    {"path": ["price"], 
     "operator": "GreaterThan",
     "valueNumber": 10}
).with_generate(
    grouped_task="Are there any commonalities between these",
).with_limit(3).do()

In [None]:
jprint(res)

In [None]:
print(res["data"]["Get"][class_name][0]["_additional"]["generate"]["groupedResult"])

In [None]:
for r in res["data"]["Get"][class_name]:
    jprint(r)

In [None]:
client.schema.delete_class("WineReview")