## Populate Weaviate instance

<a target="_blank" href="https://colab.research.google.com/github/weaviate-tutorials/intro-workshop/blob/main/2_build_new_db.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Install libraries as needed (uncomment for Colab)

In [None]:
# !pip install -Uqq weaviate-client weaviate-demo-datasets

Instantiate Weaviate client

In [None]:
import weaviate
import os

auth = weaviate.AuthApiKey("<WEAVIATE-API-KEY>")  # The Weaviate API key for your particular instance
client = weaviate.Client(
    url="<WEAVIATE-URL>",  # Your Weaviate instance URL
    auth_client_secret=auth,
    additional_headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]  ## Your OpenAI key
    }
)

Let's confirm access by fetching the schema.

In [None]:
client.schema.get()

## Populate Weaviate

Let's add some data to Weaviate. The process for importing data into Weaviate is as follows:

```
     Create class definition
                ⬇
Add class definition to the schema
                ⬇
            Load data
                ⬇
        (Batch) import data
```

Here we'll use the unofficial library `weaviate-demo-datasets` for illustrative purposes.

In [None]:
import weaviate_datasets as wd
import json

In [None]:
dataset = wd.WikiCities()

### Create class definition

In [None]:
class_objs = dataset.get_class_definitions()

In [None]:
class_objs

### Add class definition to the schema

In [None]:
for class_obj in class_objs:
  client.schema.create_class(class_obj)

In [None]:
client.schema.get()

### Load data

Load sample data from `weaviate_datasets`

In [None]:
dataset.data_fpath

In [None]:
import pandas as pd
df = pd.read_csv(dataset.data_fpath)
df.head()

### Batch import

Note: you should almost always use bach imports for speed.

In [None]:
from weaviate.util import generate_uuid5

with client.batch() as batch:
  batch.batch_size=200
  for i, row in df.iterrows():
    if i == 5:
      break        
    data_obj = {
        "city_name": row["city"],
        "country": row["country"],
        "population": int(row["population"]),
        "wiki_summary": row["wiki_summary"]
    }
    batch.add_data_object(
        data_object=data_obj, 
        class_name="WikiCity",
        vector=None,  # You can specify your own vector here
        uuid=generate_uuid5(data_obj),
        )

Confirm import by getting an object count.

In [None]:
client.query.aggregate("WikiCity").with_meta_count().do()

Check that we have generated vectors.

In [None]:
res = client.query.get(
    "WikiCity", 
    ["city_name", "country"]
).with_additional("vector").with_limit(2).do()

In [None]:
res

### (But...) Here's one I prepared earlier

Let's import the whole dataset with a predefined function.

In [None]:
client.schema.delete_class("WikiCity")

In [None]:
quiz_dataset.upload_dataset(client)  # Includes pre-vectorized data

### Try queries

In [None]:
res = client.query.get(
    "WikiCity", 
    ["city_name", "country", "population"]
).with_limit(2).do()

In [None]:
import json
print(json.dumps(res, indent=2))

In [None]:
res = client.query.get(
    "WikiCity", ["city_name", "country", "population"]
).with_near_text(
    {"concepts": ["large international city"]}
).with_limit(5).do()

In [None]:
for r in res["data"]["Get"]["WikiCity"]:
  print(r)

In [None]:
res = client.query.get(
    "WikiCity", ["city_name", "country", "population"]
).with_near_text(
    {"concepts": ["large international city"],
     "moveAwayFrom": {"concepts": ["Eastern asia"], "force": 2.0}
     }
).with_limit(5).do()

In [None]:
for r in res["data"]["Get"]["WikiCity"]:
  print(r)

In [None]:
res = client.query.get(
    "WikiCity", ["city_name", "country", "population"]
).with_near_text(
    {"concepts": ["large international city"],
     "moveAwayFrom": {"concepts": ["Eastern asia"], "force": 2.0}
     }
).with_where(
    {"path": ["population"], 
     "operator": "GreaterThan",
     "valueInt": 20000000}
).with_limit(5).do()

In [None]:
res

In [None]:
res = client.query.get(
    "WikiCity", ["city_name", "country", "population", "wiki_summary"]
).with_near_text(
    {"concepts": ["large international city"],
     "moveAwayFrom": {"concepts": ["Eastern asia"], "force": 2.0}
     }
).with_where(
    {"path": ["population"], 
     "operator": "GreaterThan",
     "valueInt": 20000000}
).with_generate(
    grouped_task="Tell me why I should visit these cities, based on this passage:."
).with_limit(3).do()

In [None]:
print(res["data"]["Get"]["WikiCity"][0]["_additional"]["generate"]["groupedResult"])

In [None]:
for r in res["data"]["Get"]["WikiCity"]:
  print(r)