# Initial Setup

## Install Weaviate Python Client v4
> This notebook was created with Weaviate `1.25` and the Weaviate Client `4.6`

In [13]:
!pip install weaviate-client==4.6.1 --force-reinstall -U -q

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
dash 2.16.1 requires dash-core-components==2.0.0, which is not installed.
dash 2.16.1 requires dash-html-components==2.0.0, which is not installed.
dash 2.16.1 requires dash-table==5.0.0, which is not installed.
jupyter-ai 2.12.0 requires faiss-cpu, which is not installed.
botocore 1.34.51 requires urllib3<2.1,>=1.25.4; python_version >= "3.10", but you have urllib3 2.2.1 which is incompatible.
fastapi 0.103.2 requires anyio<4.0.0,>=3.7.1, but you have anyio 4.3.0 which is incompatible.
gluonts 0.13.7 requires pydantic~=1.7, but you have pydantic 2.7.1 which is incompatible.
sagemaker 2.214.3 requires protobuf<5.0,>=3.12, but you have protobuf 5.26.1 which is incompatible.
sagemaker-jupyterlab-extension-common 0.1.15 requires pydantic==1.*, but you have pydantic 2.7.1 which is incompatible.
sparkmagic 0.21.0 

## Prepare API keys

Configure your AWS Bedrock project and populate the [.env](../.env) file with your `AWS_ACCESS_KEY` and `AWS_SECRET_KEY`.

#### Set Weaviate IP here

In [14]:
WEAVIATE_IP = "35.89.240.145"
%store WEAVIATE_IP

Stored 'WEAVIATE_IP' (str)


In [15]:
from boto3 import Session

session = Session()
credentials = session.get_credentials()
current_credentials = credentials.get_frozen_credentials()

AWS_ACCESS_KEY = current_credentials.access_key
AWS_SECRET_KEY = current_credentials.secret_key
AWS_SECRET_TOKEN = current_credentials.token
print(f"AWS_ACCESS_KEY:\t{AWS_ACCESS_KEY}")
print(f"AWS_SECRET_KEY:\t{AWS_SECRET_KEY}")
print(f"AWS_SECRET_TOKEN:\t{AWS_SECRET_TOKEN}")
print(f"WEAVIATE_IP:\t{WEAVIATE_IP}")

AWS_ACCESS_KEY:	ASIAYSBMLFYMRGTEAHSX
AWS_SECRET_KEY:	DHI6MCUjoYNtIUxc21BITL3lHaZdoI6I8xCN5CEq
AWS_SECRET_TOKEN:	IQoJb3JpZ2luX2VjEOf//////////wEaCXVzLXdlc3QtMiJHMEUCIQDe5p8nA/LkI0TkgD5WLqXSll8PEgHTN01o9Xtf2xRqRAIgMwFFYNMx/Q3FEZ2Y8ZdKuteHAdxedgmBngO8m5XeplIqmgMIXxABGgw1ODg1MDM1MjY5MzciDFbixPcKHK/URIzPVyr3Aj//HWlko+KSWgGAp06XYpJkRYLi9ZCQswHBo8mPVXzT5vhgeYeYaEF29SrdkMrgJXe/vnvIgBrehLbAf5qWdgt/2vBVV8xBzDLqalRUoB0bH9oTN5VmA3BKuE+jkeDhMNJbWfWmpMf4wkideEQyhfPt/pXzFZVaQALgzh8y/eR4gPBJ3PjpJHuX434tXQidvz/YycYnzByA1/kU7bWOwFQS4FfMiqrqjeildsMvmeqE+sgqYYiBt/OuniMaHTY6VQU3gpfOWFCRd/BIG5JyfFDW2yhi9l3ZkQkZUqao7lw02PJ/z1g0DUn5qi7MJrUK0SkzoBf6IU/bw7++gC3FIMLejdWvypVOMqfNiwNR0re1xm3EE1aODaRmAsPFHDCZJ4ne7Y1k2Ds6tST44CdL3YEX0Xr4jOORqFO9Jj2eJyi/jMFNyEn9R3aI34njt505BidYbPbTWSF0OXSM73xyDMMvOE7G55u3pAYFCicOC9NhvCVdlvTBeTCd/beyBjqeAcM/FuTa+1swKXFXkzyfv7Z2K4BeWmVTcTd1F3u7iv00C8cZhhxi6DEZIIJjkaRTUXDoJljzZx57zIylDkNJPhddOhhH41J1Vm17afUKv8e1sJW3Dxrh8d1njnczAo/CqOPEfWCeJcDNBLlyr8DYgQ4H6Jj/SXQvl4EcPBKAcrwpIbaCDT1O7uOU

## Deploy Weaviate

Weaviate offers 3 deployment options:
* Self-hosted - with Docker Compose, Kubernetes
* Cloud deployment - [Weaviate Cloud](https://console.weaviate.cloud/)
* Embedded

# Time to Build

## Connect to Weaviate

In [16]:
import weaviate

client = weaviate.connect_to_custom(
    http_host=WEAVIATE_IP, http_port="8080",  http_secure=False,
    grpc_host=WEAVIATE_IP, grpc_port="50051", grpc_secure=False,

    headers={
        "X-AWS-Access-Key": AWS_ACCESS_KEY,
        "X-AWS-Secret-Key": AWS_SECRET_KEY,
        "X-AWS-Session-Token": AWS_SECRET_TOKEN,
    }
)

## Create a collection
[Weaviate Docs - collection creation and configuration](https://weaviate.io/developers/weaviate/configuration/schema-configuration)

In [18]:
from weaviate.classes.config import Configure

if client.collections.exists("Jeopardy"):
    client.collections.delete("Jeopardy")

# Create a collection here
client.collections.create(
    name="Jeopardy",

    # Create a collection - with AWS vectorizer and Titan model("amazon.titan-embed-text-v1")
    vectorizer_config=Configure.Vectorizer.text2vec_aws(
        model="amazon.titan-embed-text-v1",
        region="us-west-2",
    ),
)

<weaviate.collections.collection.Collection at 0x7f74a48467a0>

## Import data

### Sample Data

In [19]:
import json

with open("./jeopardy-100.json") as file:
    data_100 = json.load(file)

print(json.dumps(data_100[0], indent=2))

{
  "category": "TRANSPORTATION",
  "question": "The railroad known by this hyphenated name runs over 5,000 miles from Moscow to Vladivostok",
  "answer": "Trans-Siberian Railroad"
}


### Insert Many
[Weaviate Docs - insert many](https://weaviate.io/developers/weaviate/manage-data/import)

In [20]:
# Insert data
jeopardy = client.collections.get("Jeopardy")
jeopardy.data.insert_many(data_100)

BatchObjectReturn(all_responses=[UUID('cd1d4382-0ef7-45c1-8aed-982b2de171db'), UUID('9b354e03-3733-4908-b738-bb99a87f73c3'), UUID('00735efb-45a6-49bb-972e-ab2469be2707'), UUID('f4b38319-8779-4c5f-9206-b9d2fdc730f1'), UUID('9a00915b-2965-4ced-b207-9790f4759184'), UUID('6a67d0e6-ba17-4c53-a28a-9c7342db2f7e'), UUID('a9dc0803-e700-4122-b1fe-9d6e78b9b7a6'), UUID('fd8cf35e-21b1-4b0a-861e-72ca7632331f'), UUID('b41f28be-f612-46fe-b3bf-74c7ce97511d'), UUID('829fe8e4-dbb0-474a-987e-1166c56556a2'), UUID('6db7db77-9370-4a11-bc56-c7e90d7e5b55'), UUID('c8589a02-08e7-4e42-b0f0-bd052606c53b'), UUID('87f350fe-1399-44f3-8bac-674732f9abb7'), UUID('d7635a47-ad62-4c69-9aa0-11e266daa1b3'), UUID('36ded5e7-6e5a-4aba-97bb-6fe09f2a737c'), UUID('87872b5c-b106-4409-98bc-8be0f46d255c'), UUID('4bc6f081-36ad-450a-a24f-d59006ea9267'), UUID('48eff670-0579-486a-a02e-7c003b91e7b3'), UUID('6b31d495-130b-421c-95b3-119b8fbd50e1'), UUID('b7abf5d3-d764-4c2c-b6a7-4f6159abaf3c'), UUID('bcf372f9-81cb-4b06-9934-566827beb507'), U

## Check the object count
[Weaviate Docs - Aggregate](https://weaviate.io/developers/weaviate/search/aggregate#retrieve-the-count-meta-property)

In [21]:
questions = client.collections.get("Jeopardy")
questions.aggregate.over_all()

AggregateReturn(properties={}, total_count=100)

### Data preview

In [22]:
# Show data preview
jeopardy = client.collections.get("Jeopardy")
response = jeopardy.query.fetch_objects(limit=4)

for item in response.objects:
    print(item.uuid, item.properties)

00735efb-45a6-49bb-972e-ab2469be2707 {'answer': 'Anna Pavlova', 'question': 'For a 1905 benefit, Michel Fokine created "The Dying Swan" for this Russian ballerina', 'category': 'BALLET'}
05ab42c5-feb5-4456-9150-b420d762bd09 {'answer': 'Windsor', 'question': '(Alex: To read the clue, now appearing in "The Royal Tour", please welcome Dame Edna Everage)  England\'s current royal family belongs to the House of this; it\'s also the name of a castle', 'category': 'PEOPLE'}
06e702ed-fab3-48ac-9029-90241862ffb0 {'answer': 'faint paint', 'question': 'Any pigment on the wall so faded you can barely see it', 'category': 'RHYME TIME'}
0ace9fed-fc85-49d7-a5a2-ad9f1e216273 {'answer': 'India', 'question': 'In 1975, the Himalayan country of Sikkim was absorbed by this country', 'category': 'HISTORY'}


In [23]:
# Show data preview - with vectors
jeopardy = client.collections.get("Jeopardy")
response = jeopardy.query.fetch_objects(
    limit=4,
    include_vector=True
)

for item in response.objects:
    print(item.properties)
    print(item.vector, '\n')

{'answer': 'Anna Pavlova', 'question': 'For a 1905 benefit, Michel Fokine created "The Dying Swan" for this Russian ballerina', 'category': 'BALLET'}
{'default': [0.36328125, 0.478515625, -0.20703125, -0.5390625, -0.07080078125, 0.11083984375, -0.423828125, 0.00012159347534179688, -0.04052734375, 0.11767578125, -0.384765625, 0.40625, -0.384765625, 0.302734375, 0.16796875, -0.00421142578125, -0.41796875, -0.404296875, -0.46484375, -0.080078125, -0.0157470703125, 0.259765625, -0.5078125, 0.4296875, -0.047119140625, 0.255859375, 0.1318359375, -0.65234375, -0.353515625, -0.30859375, -0.0849609375, -0.26953125, 0.119140625, -0.60546875, 0.57421875, -0.0062255859375, 0.51171875, -0.0024566650390625, 0.80078125, 0.1103515625, -0.04833984375, 0.05126953125, -0.390625, 0.47265625, 0.12890625, -0.275390625, 0.1005859375, -0.484375, 0.26171875, -0.0157470703125, -0.54296875, 0.53125, 0.259765625, -0.8125, -0.12109375, -0.5546875, -0.58203125, 0.1748046875, -1.03125, -0.30859375, -0.32421875, 0.76

## Close the client when done

In [24]:
client.close()