## Populate Weaviate instance

### Add env vars (Colab only)

In [1]:
# import os

# os.environ["COHERE_APIKEY"] = "YOUR_COHERE_KEY"
# os.environ["OPENAI_APIKEY"] = "YOUR_OPENI_KEY"

## Prep

Install libraries as needed (uncomment for Colab)

In [2]:
# !pip install -Uqq weaviate-client weaviate-demo-datasets

In [3]:
def jprint(str_in: str):
    import json
    print(json.dumps(str_in, indent=2))

# Fun with Weaviate 😁🚀

Instantiate Weaviate client

In [4]:
import os

api_headers = {
    # You *ONLY* need the API key for the inference service that you are using
    # You will define further below (at Weaviate class-level) which API inference service to use for your data    
    "X-OpenAI-Api-Key": os.environ["OPENAI_APIKEY"],  # Only Replace with your OpenAI key
    "X-Cohere-Api-Key": os.environ["COHERE_APIKEY"]  # Replace with your Cohere key
}

### Weaviate instantiation options

There are many ways to run Weaviate! The easiest ways are:

- Embedded Weaviate (Linux only; MacOS support coming soon)
  - No additional step required
- WCS (SaaS option)
  - Go to https://console.weaviate.cloud and create an instance
- Docker-Compose
  - Go to https://weaviate.io/developers/weaviate/installation/docker-compose to get the config file

In [5]:
# # Option 1 - If using Colab:
# import weaviate
# from weaviate.embedded import EmbeddedOptions

# client = weaviate.Client(
#     embedded_options=EmbeddedOptions(version="latest"),
#     additional_headers=api_headers
# )

In [6]:
# Option 2 - If using WCS sandbox:
import weaviate

client = weaviate.Client(
    "https://workshop-demo-0xwl314q.weaviate.network",  # Replace this with your sandbox URL
    auth_client_secret=weaviate.AuthApiKey("KQBu0wOvoWd70rXJIf1hs1oFkSmnxiupA7rm"),  # Replace this with your API Key
    additional_headers=api_headers
)

In [7]:
# # Option 3 - If using Docker-Compose:
# import weaviate
#
# client = weaviate.Client(
#     "http://localhost:8080",
#     additional_headers=api_headers
# )

Let's confirm that we are ready to go.

In [8]:
client.is_ready()

True

## What's available on my instance?

In [9]:
jprint(client.get_meta())

{
  "hostname": "http://[::]:8080",
  "modules": {
    "generative-cohere": {
      "documentationHref": "https://docs.cohere.com/reference/generate",
      "name": "Generative Search - Cohere"
    },
    "generative-openai": {
      "documentationHref": "https://beta.openai.com/docs/api-reference/completions",
      "name": "Generative Search - OpenAI"
    },
    "generative-palm": {
      "documentationHref": "https://cloud.google.com/vertex-ai/docs/generative-ai/chat/test-chat-prompts",
      "name": "Generative Search - Google PaLM"
    },
    "qna-openai": {
      "documentationHref": "https://beta.openai.com/docs/api-reference/completions",
      "name": "OpenAI Question & Answering Module"
    },
    "text2vec-cohere": {
      "documentationHref": "https://docs.cohere.ai/embedding-wiki/",
      "name": "Cohere Module"
    },
    "text2vec-huggingface": {
      "documentationHref": "https://huggingface.co/docs/api-inference/detailed_parameters#feature-extraction-task",
      "nam

### Version

Note the Weaviate version

## Modules

What are these `modules`?

Modules allow you to configure Weaviate (so that you can work with your data in a way that suits you). Most commonly used modules are:
- Vectorizers (e.g. `text2vec-cohere`, `text2vec-huggingface`, `text2vec-openai`, `text2vec-palm`, etc.)
- Generative modules (e.g. `generative-xxx`)

## Populating Weaviate

### Import process

The process for importing data into Weaviate is:

```
Add class definition to the schema
                ⬇
        (Batch) import data
```

**Discussion**: Why batch import?

### What data to import?

In [10]:
import weaviate_datasets as wd

dataset = wd.WineReviews()
dataset.get_sample()

{'WineReview': ({'review_body': 'Vegetal-leaning, stewy aromas of plum and jammy berry are saucy. This feels sticky and low in acidity. Grassy, herbal plum and berry flavors are a bit green on the finish.',
   'title': 'Finca Flichman 2015 Misterio Malbec (Mendoza)',
   'country': 'Argentina',
   'points': 83,
   'price': 10.0},
  None)}

In [11]:
class_name = "WineReview"

### Add class definition

In [12]:
class_objs = dataset.get_class_definitions()
class_objs[0].keys()

dict_keys(['class', 'vectorizer', 'moduleConfig', 'properties'])

### Required information

Notice above that we have `class`, `vectorizer`, `moduleConfig`, `properties` data:

- `class`: The class name (like a SQL table name)
- `vectorizer`: Module to be used to generate vectors
- `moduleConfig`: Configure various modules to be used with the class
- `properties`: Define object properties (like a SQL column)

In [13]:
class_objs

[{'class': 'WineReview',
  'vectorizer': 'text2vec-openai',
  'moduleConfig': {'generative-openai': {'model': 'gpt-3.5-turbo'}},
  'properties': [{'name': 'review_body',
    'dataType': ['text'],
    'description': 'Review body'},
   {'name': 'title', 'dataType': ['text'], 'description': 'Name of the wine'},
   {'name': 'country',
    'dataType': ['text'],
    'description': 'Originating country'},
   {'name': 'points',
    'dataType': ['int'],
    'description': 'Review score in points'},
   {'name': 'price', 'dataType': ['number'], 'description': 'Listed price'}]}]

### Option 1: Bring your own vector

![img](https://github.com/weaviate-tutorials/intro-workshop/blob/main/images/object_import_process_simple.png?raw=1)

In this use case, you simply import your data and the associated vector. Weaviate simply indexes your data for you, so that you can search through it quickly, and effectively.

In [14]:
for class_obj in class_objs:
    if not client.schema.contains(class_obj):
        print(f"Adding {class_obj['class']}")
        client.schema.create_class(class_obj)

In [15]:
from weaviate.util import generate_uuid5

loader = dataset._class_dataloader(class_name)

with client.batch() as batch:
    for data_objs in loader:
        batch.add_data_object(
            data_object=data_objs[0],
            class_name=class_name,
            vector=[1, 2, 3],  # Specify your own vector here
            uuid=generate_uuid5(data_objs[0]["title"]),  # This will generate a deterministic UUID based on the data object's content 
        )

In [16]:
res = (
    client.query.aggregate("WineReview")
    .with_meta_count()
    .do()
)

print(res)

{'data': {'Aggregate': {'WineReview': [{'meta': {'count': 50}}]}}}


In [17]:
client.schema.delete_class(class_name)

### Option 2: Use a vectorizer module

![img](https://github.com/weaviate-tutorials/intro-workshop/blob/main/images/object_import_process_full.png?raw=1)

In this use case, you task Weaviate with the additional work of converting your data into vectors. The `vectorizer` module performs this job - whether it be through a local model, or using an external vectorizer API service.

In [18]:
class_objs[0]["vectorizer"]

'text2vec-openai'

In [19]:
# # If you want to change the vectorizer:

# class_objs[0]["vectorizer"] = "text2vec-cohere"
# class_objs

### Add class definition to the schema

In [20]:
for class_obj in class_objs:
    if not client.schema.contains(class_obj):
        print(f"Adding {class_obj['class']}")
        client.schema.create_class(class_obj)

Adding WineReview


### Batch import

Note: you should almost always use bach imports for speed.

In [21]:
from weaviate.util import generate_uuid5

loader = dataset._class_dataloader(class_name)

with client.batch() as batch:
    for data_objs in loader:
        batch.add_data_object(
            data_object=data_objs[0],
            class_name=class_name,
#             vector=[1, 2, 3],  # Specify your own vector here
            uuid=generate_uuid5(data_objs[0]["title"]),  # This will generate a deterministic UUID based on the data object's content 
        )

Confirm import by getting an object count.

In [22]:
client.query.aggregate(class_name).with_meta_count().do()

{'data': {'Aggregate': {'WineReview': [{'meta': {'count': 50}}]}}}

Check that we have generated vectors.

In [23]:
res = client.query.get(
    class_name,
    ["title", "country", "review_body", "points"]
).with_additional("vector").with_limit(2).do()

In [24]:
jprint(res)

{
  "data": {
    "Get": {
      "WineReview": [
        {
          "_additional": {
            "vector": [
              0.021576267,
              -0.029585836,
              -0.010598982,
              -0.043856956,
              0.0022176332,
              0.02062399,
              -0.019619532,
              0.006405046,
              -0.02115883,
              -0.043856956,
              -0.006111536,
              0.023598226,
              -0.022306781,
              -0.016749652,
              0.009137954,
              0.03321232,
              0.031151224,
              0.0012205136,
              -0.0017969351,
              -0.014662469,
              0.0010338085,
              0.001378683,
              -0.02174585,
              -0.0054821195,
              -0.0014235248,
              0.0027051864,
              0.010970762,
              -0.024041753,
              -0.025620185,
              0.0058799894,
              0.01282966,
              -0.011655619,
      

## Try queries

### Similarity-based searches

#### Against a text input

In [25]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["a fruity white wine"]}
).with_limit(5).do()

In [26]:
for r in res["data"]["Get"][class_name]:
    print(r)

{'country': 'US', 'points': 85, 'review_body': 'Not at all a typical Sauvignon Blanc, this smells like apricot and honeysuckle and tastes like marmalade. It is dry, yet tastes like a late-harvest dessert wine. Expect a little taste adventure here.', 'title': 'Schmitz 24 Brix 2012 Sauvignon Blanc (Sierra Foothills)'}
{'country': 'Argentina', 'points': 87, 'review_body': "Fruity on the nose, with a friendly mix of pineapple, apple, melon and powdered sugar aromas. Feels smooth and round but not heavy, with good acidity and likable, simple flavors of melon, banana and apple. There's a light note of toast on the finish.", 'title': 'Alamos 2010 Chardonnay (Mendoza)'}
{'country': 'Austria', 'points': 85, 'review_body': "With notions of cherry and cinnamon on the nose and just slight fizz, this is a refreshing, fruit-driven sparkling rosé that's full of strawberry and cherry notes—it might just be the very definition of easy summer wine. It ends dry, yet refreshing.", 'title': 'Gebeshuber 201

#### Move "away" from certain types of objects

In [27]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_limit(5).do()

In [28]:
for r in res["data"]["Get"][class_name]:
    jprint(r)

{
  "country": "Austria",
  "points": 85,
  "review_body": "With notions of cherry and cinnamon on the nose and just slight fizz, this is a refreshing, fruit-driven sparkling ros\u00e9 that's full of strawberry and cherry notes\u2014it might just be the very definition of easy summer wine. It ends dry, yet refreshing.",
  "title": "Gebeshuber 2013 Frizzante Ros\u00e9 Pinot Noir (\u00d6sterreichischer Perlwein)"
}
{
  "country": "Portugal",
  "points": 88,
  "review_body": "This is a spicy earthy wine with its high percentage of Alicante Bouschet giving rich tannins and a dark dense texture. The wine is packed with dense fruit and power, perhaps erring on the side of too much, with a pepper edge to the concentrated black fruit. Drink now.",
  "title": "Herdade das Servas 2015 Sem Barrica Unoaked Red (Alentejano)"
}
{
  "country": "Austria",
  "points": 89,
  "review_body": "Beautifully perfumed, with acidity, white fruits and a mineral context. The wine is layered with citrus and lime, 

### Filter objects

In [29]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_where(
    {"path": ["price"],
     "operator": "GreaterThan",
     "valueNumber": 10}
).with_limit(5).do()

In [30]:
jprint(res)

{
  "data": {
    "Get": {
      "WineReview": [
        {
          "country": "Austria",
          "points": 85,
          "review_body": "With notions of cherry and cinnamon on the nose and just slight fizz, this is a refreshing, fruit-driven sparkling ros\u00e9 that's full of strawberry and cherry notes\u2014it might just be the very definition of easy summer wine. It ends dry, yet refreshing.",
          "title": "Gebeshuber 2013 Frizzante Ros\u00e9 Pinot Noir (\u00d6sterreichischer Perlwein)"
        },
        {
          "country": "Portugal",
          "points": 88,
          "review_body": "This is a spicy earthy wine with its high percentage of Alicante Bouschet giving rich tannins and a dark dense texture. The wine is packed with dense fruit and power, perhaps erring on the side of too much, with a pepper edge to the concentrated black fruit. Drink now.",
          "title": "Herdade das Servas 2015 Sem Barrica Unoaked Red (Alentejano)"
        },
        {
          "countr

#### Generative search

In [31]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["earthy European wine"],
     "moveAwayFrom": {"concepts": ["white wine"], "force": 2.0}
     }
).with_where(
    {"path": ["price"],
     "operator": "GreaterThan",
     "valueNumber": 10}
).with_generate(
    grouped_task="Are there any commonalities between these?",
).with_limit(3).do()

In [32]:
jprint(res)

{
  "data": {
    "Get": {
      "WineReview": [
        {
          "_additional": {
            "generate": {
              "error": null,
              "groupedResult": "Yes, all three reviews are about wine and include information about the country of origin, the name of the wine, and a description of its taste and characteristics."
            }
          },
          "country": "Austria",
          "points": 85,
          "review_body": "With notions of cherry and cinnamon on the nose and just slight fizz, this is a refreshing, fruit-driven sparkling ros\u00e9 that's full of strawberry and cherry notes\u2014it might just be the very definition of easy summer wine. It ends dry, yet refreshing.",
          "title": "Gebeshuber 2013 Frizzante Ros\u00e9 Pinot Noir (\u00d6sterreichischer Perlwein)"
        },
        {
          "_additional": {
            "generate": null
          },
          "country": "Portugal",
          "points": 88,
          "review_body": "This is a spicy 

## What if we try a different model?

In [33]:
client.schema.delete_class("WineReview")

# If you want to change the vectorizer:

class_objs[0]["vectorizer"] = "text2vec-cohere"

In [34]:
from weaviate.util import generate_uuid5

for class_obj in class_objs:
    if not client.schema.contains(class_obj):
        print(f"Adding {class_obj['class']}")
        client.schema.create_class(class_obj)

loader = dataset._class_dataloader(class_name)

with client.batch() as batch:
    for data_objs in loader:
        batch.add_data_object(
            data_object=data_objs[0],
            class_name=class_name,
            uuid=generate_uuid5(data_objs[0]["title"]),  # This will generate a deterministic UUID based on the data object's content 
        )

Adding WineReview


In [35]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["Vino argentino fácil de beber"]}
).with_limit(5).do()

In [36]:
res

{'data': {'Get': {'WineReview': [{'country': 'Argentina',
     'points': 83,
     'review_body': 'Vegetal-leaning, stewy aromas of plum and jammy berry are saucy. This feels sticky and low in acidity. Grassy, herbal plum and berry flavors are a bit green on the finish.',
     'title': 'Finca Flichman 2015 Misterio Malbec (Mendoza)'},
    {'country': 'US',
     'points': 86,
     'review_body': 'In Spain, Albariño is usually a steely, bone dry, sea-inflected white wine. Here it is made in an off-dry style, with 10 g/L residual sugar. The sweetness masks any apparent varietal specificity, resulting in a pleasant, but generic white wine.',
     'title': 'Castillo De Feliciana 2012 Albariño (Columbia Valley (WA))'},
    {'country': 'Argentina',
     'points': 91,
     'review_body': 'Generous, sizable aromas of cassis, blackberry and violet are inviting. This is juicy, with slightly wiry acidity creating a racy mouthfeel. Flavors of berry and cassis are loamy on their own but ride comforta

In [37]:
res = client.query.get(
    class_name, ["title", "country", "review_body", "points"]
).with_near_text(
    {"concepts": ["Vino argentino fácil de beber"]}
).with_generate(
    single_prompt="Translate this to French {review_body}"
).with_limit(5).do()

In [38]:
res

{'data': {'Get': {'WineReview': [{'_additional': {'generate': {'error': None,
       'singleResult': 'Les arômes végétaux et de ragoût de prune et de baies confiturées sont saucés. Cela donne une sensation collante et faible en acidité. Les saveurs herbacées et de prunes et de baies sont un peu vertes en fin de bouche.'}},
     'country': 'Argentina',
     'points': 83,
     'review_body': 'Vegetal-leaning, stewy aromas of plum and jammy berry are saucy. This feels sticky and low in acidity. Grassy, herbal plum and berry flavors are a bit green on the finish.',
     'title': 'Finca Flichman 2015 Misterio Malbec (Mendoza)'},
    {'_additional': {'generate': {'error': None,
       'singleResult': "En Espagne, l'Albariño est généralement un vin blanc acier, sec et infusé de saveurs marines. Ici, il est produit dans un style demi-sec, avec 10 g/L de sucre résiduel. La douceur masque toute spécificité variétale apparente, résultant en un vin blanc agréable mais générique."}},
     'country'

## Add data - with cross-references

The syntax looks like this:

```python
sf_id = "00ff6900-e64f-5d94-90db-c8cfa3fc851b"
us_cities_id = "20ffc68d-986b-5e71-a680-228dba18d7ef"

client.data_object.reference.add(
    from_class_name="JeopardyQuestion",
    from_uuid=sf_id,
    from_property_name="hasCategory",
    to_class_name="JeopardyCategory",
    to_uuid=us_cities_id,
)
```

Cross-references allow you to link data objects. You can:

- Retrieved cross-referenced properties
- Filter data with cross-references

But, they do *not* impact object vectors.

### For more examples: See our "how-to" search pages

- [How to: Search](https://weaviate.io/developers/weaviate/search)
- [How to: Manage data](https://weaviate.io/developers/weaviate/manage-data)

### If you want to clean up the data

**This will delete all of your data in the *WineReview* class!**

In [39]:
client.schema.delete_class("WineReview")