# Initial Setup

## Install Weaviate Python Client v4
> The v4 version is currently in Beta, but it is toooo awesome not to use it ;)
>
> It should be fully released in December.

Run the below command to install the latest version of the Weaviate Python Client.

In [None]:
!pip install --pre -I "weaviate-client==4.*"

## Deploy Weaviate

Weaviate offers 3 deployment options:
* Embedded
* Self-hosted - with Docker Compose
* Cloud deployment - [Weaviate Cloud Service](https://console.weaviate.cloud/)

# Time to Build

## Connect to Weaviate

* If you are new to OpenAI, register at [https://platform.openai.com](https://platform.openai.com/) and head to [https://platform.openai.com/api-keys](https://platform.openai.com/api-keys) to create your API key.
* If you are new to Cohere, register at [https://cohere.com](https://https://cohere.com) and head to [https://dashboard.cohere.com/api-keys](https://dashboard.cohere.com/api-keys) to create your API key.

In [None]:
import weaviate, os

# Connect with Weaviate Embedded
client = weaviate.connect_to_embedded(
    headers={
        "X-OpenAI-Api-Key": os.environ['OPENAI_API_KEY'], # Replace with your inference API key
        "X-Cohere-Api-Key": os.environ['COHERE_API_KEY'], # Replace with your inference API key
    })

# Connect to the local instance deployed with Docker Compose
# client = weaviate.connect_to_local(
#     headers={
#         "X-OpenAI-Api-Key": os.environ['OPENAI_API_KEY'], # Replace with your inference API key
#         "X-Cohere-Api-Key": os.environ['COHERE_API_KEY'], # Replace with your inference API key
#     }
# )

client.is_ready()

## Sample Data

In [2]:
import requests, json

def load_data(path):
    resp = requests.get(path)
    return json.loads(resp.text)

sample_10 = "https://raw.githubusercontent.com/weaviate-tutorials/multimodal-workshop/main/1-intro/jeopardy_tiny.json"
sample_1k = "https://raw.githubusercontent.com/weaviate-tutorials/multimodal-workshop/main/1-intro/jeopardy_1k.json"

data_10 = load_data(sample_10)
data_1k = load_data(sample_1k)

print(json.dumps(data_10, indent=2))

[
  {
    "Category": "SCIENCE",
    "Question": "This organ removes excess glucose from the blood & stores it as glycogen",
    "Answer": "Liver"
  },
  {
    "Category": "ANIMALS",
    "Question": "It's the only living mammal in the order Proboseidea",
    "Answer": "Elephant"
  },
  {
    "Category": "ANIMALS",
    "Question": "The gavial looks very much like a crocodile except for this bodily feature",
    "Answer": "the nose or snout"
  },
  {
    "Category": "ANIMALS",
    "Question": "Weighing around a ton, the eland is the largest species of this animal in Africa",
    "Answer": "Antelope"
  },
  {
    "Category": "ANIMALS",
    "Question": "Heaviest of all poisonous snakes is this North American rattlesnake",
    "Answer": "the diamondback rattler"
  },
  {
    "Category": "SCIENCE",
    "Question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification",
    "Answer": "species"
  },
  {
    "Category": "SCIENCE",
   

## Create a collection
[Weaviate Docs - collection creation and configuration](https://weaviate.io/developers/weaviate/configuration/schema-configuration)

In [4]:
import weaviate.classes as wvc

if client.collections.exists("Questions"):
    client.collections.delete("Questions")

# Create a collection here - with Cohere as a vectorizer
client.collections.create(
    name="Questions",
    vectorizer_config=wvc.Configure.Vectorizer.text2vec_cohere()
)

{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"questions_nEOyMR5Q1E58","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-14T10:14:34-08:00","took":43042}


<weaviate.collections.collection.Collection at 0x10750c390>

## Import data
[Weaviate Docs - insert many](https://weaviate.io/developers/weaviate/manage-data/import)

In [5]:
# Insert data
questions = client.collections.get("Questions")
questions.data.insert_many(data_10)

BatchObjectReturn(all_responses=[UUID('6b32ae20-9933-4858-8075-c75bac4cd459'), UUID('d4f2e2ea-3998-4fb5-bb6d-51a178b880b0'), UUID('3b744baf-8258-474d-81ee-77c5ba51f8d6'), UUID('5318f1b7-bd69-44b5-8cb2-f6435ed35847'), UUID('58b916e4-8864-4da6-b19e-c7f922ba6306'), UUID('06c0316b-88ae-4416-9d96-77925999f3b8'), UUID('ded27397-8fd4-4db0-9be3-7b0bbb7e6655'), UUID('7e7fdffd-021c-498c-82e0-7c426d7bb85f'), UUID('5d8b0608-5a02-4b9b-9c4b-f6ae6300a2a9'), UUID('1304098d-9c58-48f0-8fdb-87c8f395d47a')], uuids={0: UUID('6b32ae20-9933-4858-8075-c75bac4cd459'), 1: UUID('d4f2e2ea-3998-4fb5-bb6d-51a178b880b0'), 2: UUID('3b744baf-8258-474d-81ee-77c5ba51f8d6'), 3: UUID('5318f1b7-bd69-44b5-8cb2-f6435ed35847'), 4: UUID('58b916e4-8864-4da6-b19e-c7f922ba6306'), 5: UUID('06c0316b-88ae-4416-9d96-77925999f3b8'), 6: UUID('ded27397-8fd4-4db0-9be3-7b0bbb7e6655'), 7: UUID('7e7fdffd-021c-498c-82e0-7c426d7bb85f'), 8: UUID('5d8b0608-5a02-4b9b-9c4b-f6ae6300a2a9'), 9: UUID('1304098d-9c58-48f0-8fdb-87c8f395d47a')}, errors={

In [7]:
# Show data preview
response = questions.query.fetch_objects(limit=4)

print(response.objects[0].properties)

{'answer': 'species', 'question': "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification", 'category': 'SCIENCE'}


In [8]:
# Show vector
item = questions.query.fetch_object_by_id('6b32ae20-9933-4858-8075-c75bac4cd459', include_vector=True)
print(item.metadata.vector)

[0.011077881, -0.026184082, -0.016159058, 0.1138916, -0.07171631, -0.019760132, 0.025634766, 0.038208008, -0.0692749, 0.023269653, 0.07489014, 0.01838684, -0.008293152, -0.03866577, 0.03793335, -0.046081543, 0.06365967, 0.034301758, 0.074035645, -0.013977051, -0.02998352, -0.017669678, 0.059539795, -0.07092285, -0.04650879, 0.058532715, 0.009033203, -0.04156494, 0.03250122, -0.015472412, 0.013679504, -0.009880066, 0.038635254, 0.016021729, -0.0129852295, -0.03314209, 0.00970459, -0.012214661, 0.005596161, 0.0019283295, -0.008018494, -0.015701294, -0.060821533, 0.03744507, 0.009796143, 0.032562256, 0.0015220642, 0.013069153, 0.040008545, 0.019302368, -0.038269043, 0.03591919, 0.006965637, -0.038146973, 0.008224487, 0.002500534, 0.0231781, 0.07019043, 0.008773804, 0.029663086, 0.015281677, 0.001124382, -0.035827637, 0.036956787, -0.016143799, 0.004055023, -0.0040130615, 0.047668457, -0.0018568039, 0.011482239, 0.035186768, 0.05126953, 0.03564453, -0.021484375, -0.060455322, -0.036743164,

## Create a collection with OpenAI and Generative module

In [9]:
# new collection with 1k objects and OpenAI vectorizer and generative model

import weaviate.classes as wvc

if client.collections.exists("Questions"):
    client.collections.delete("Questions")

# Create a collection here - with Cohere as a vectorizer
client.collections.create(
    name="Questions",
    vectorizer_config=wvc.Configure.Vectorizer.text2vec_openai(),
    generative_config=wvc.Configure.Generative.openai(model="gpt-4")
)

{"action":"hnsw_vector_cache_prefill","count":1000,"index_id":"questions_pjFWiDOL4arq","level":"info","limit":1000000000000,"msg":"prefilled vector cache","time":"2023-11-14T10:18:41-08:00","took":41459}


<weaviate.collections.collection.Collection at 0x10a3cbcd0>

In [10]:
# Insert data
questions = client.collections.get("Questions")
questions.data.insert_many(data_1k)

BatchObjectReturn(all_responses=[UUID('ad99e6c7-4224-4db4-9dfe-649ac0a2f6ec'), UUID('e57b9bfe-5935-4786-b25d-96acde0feba2'), UUID('6916ccec-593a-4bf5-9c87-4a13447ba017'), UUID('4e4ae24e-51d8-47f8-ac88-87af02c68804'), UUID('c3897557-3aef-4672-ac93-1009a0ca610e'), UUID('80817244-8a4b-4461-b11a-990ed44aa885'), UUID('fe1512b1-01b9-49b4-a589-4a94a2b69e16'), UUID('d40942c8-382a-4b2b-858c-f81c67845ba1'), UUID('1c301c2e-1216-41a9-ab2f-bf4e0d7e202a'), UUID('c709c069-5fed-4500-888a-01d2cefc176a'), UUID('51cc7f33-3189-4e65-a5dd-cfdf97571e6c'), UUID('da50fe04-a4c9-4993-9583-2f798cfb7212'), UUID('34b47928-9f17-4ef9-94aa-88cfe33b4d09'), UUID('a399d61e-cf0e-4204-9412-57e52ffcfca6'), UUID('5e9a254f-af04-448a-b044-ac9834a46953'), UUID('933d2210-9dae-4c1f-8d57-5720749b9648'), UUID('8928c1a6-8e23-4301-84d0-fc109cb61209'), UUID('c0f501ad-bf6e-4325-8f93-aaab6293ff6e'), UUID('e8acf704-55b6-4beb-8945-a9b11c023cc1'), UUID('e45f2ac1-9dc6-4b83-b837-b5c733f9038f'), UUID('466e37b6-3689-457c-ac3a-efa446106a12'), U

In [11]:
res = questions.aggregate.over_all()
print(res)

_AggregateReturn(properties={}, total_count=1000)
