`text2vec-transformers` is **only** available through Weaviate open-source. Here are options to select your desired model: 

1. [Pre-built transformers model containers](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-transformers#pre-built-images)

2. [Any model from Hugging Face Model Hub](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-transformers#option-2-use-any-publicly-available-hugging-face-model)

3. [Use any private or local PyTorch or Tensorflow transformer model](https://weaviate.io/developers/weaviate/modules/retriever-vectorizer-modules/text2vec-transformers#option-3-custom-build-with-a-private-or-local-model)

## Dependencies

In [None]:
!pip install --pre -I "weaviate-client==4.5.6"

## Connect to Weaviate

In [3]:
import weaviate
from weaviate.classes.init import AdditionalConfig, Timeout
# Connect to your local Weaviate instance deployed with Docker
client = weaviate.connect_to_local(
    additional_config=AdditionalConfig(timeout=[900,1200])
)

client.is_ready()

  client = weaviate.connect_to_local(


True

In [4]:
client._connection.timeout_config


Timeout(query=900, insert=1200, init=2)

## Create a collection
> Collection stores your data and vector embeddings.

In [5]:
# Note: in practice, you shouldn't rerun this cell, as it deletes your data
# in "JeopardyQuestion", and then you need to re-import it again.
import weaviate.classes.config as wc

# Delete the collection if it already exists
if (client.collections.exists("JeopardyQuestion")):
    client.collections.delete("JeopardyQuestion")

client.collections.create(
    name="JeopardyQuestion",

    vectorizer_config=wc.Configure.Vectorizer.text2vec_transformers( # specify the vectorizer and model type you're using
        pooling_strategy="cls",
        inference_url="http://t2v-transformers:8080" # endpoint of the snowflake-arctic-embed-l model
    ),

    properties=[ # defining properties (data schema) is optional
        wc.Property(name="Question", data_type=wc.DataType.TEXT), 
        wc.Property(name="Answer", data_type=wc.DataType.TEXT),
        wc.Property(name="Category", data_type=wc.DataType.TEXT, skip_vectorization=True), 
    ]
)

print("Successfully created collection: JeopardyQuestion.")

Successfully created collection: JeopardyQuestion.


## Import the Data

In [6]:
import requests, json
url = 'https://raw.githubusercontent.com/weaviate/weaviate-examples/main/jeopardy_small_dataset/jeopardy_tiny.json'
resp = requests.get(url)
data = json.loads(resp.text)


# Get a collection object for "JeopardyQuestion"
jeopardy = client.collections.get("JeopardyQuestion")

# Insert data objects
response = jeopardy.data.insert_many(data)

# Note, the `data` array contains 10 objects, which is great to call insert_many with.
# However, if you have a milion objects to insert, then you should spit them into smaller batches (i.e. 100-1000 per insert)

if (response.has_errors):
    print(response.errors)
else:
    print("Insert complete.")

Insert complete.


## Hybrid Search

The `alpha` parameter determines the weight given to the sparse and dense search methods. `alpha = 0` is pure sparse (bm25) search, whereas `alpha = 1` is pure dense (vector) search. 

Alpha is an optional parameter. The default is set to `0.75`.

### Hybrid Search only

The below query is finding Jeopardy questions about animals and is limiting the output to only two results. Notice `alpha` is set to `0.80`, which means it is weighing the vector search results more than bm25. If you were to set `alpha = 0.25`, you would get different results. 

In [7]:
# note, you can reuse the collection object from the previous cell.
# Get a collection object for "JeopardyQuestion"
jeopardy = client.collections.get("JeopardyQuestion")

response = jeopardy.query.hybrid(
    query="northern beast",
    alpha=0.8,
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

ID: 1e895b51-a59e-4de4-bdf9-f22ece9180d4
Data: {
  "answer": "species",
  "question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification",
  "category": "SCIENCE"
} 

ID: 73fb3875-3783-4333-978f-7f2e3888aecf
Data: {
  "answer": "wire",
  "question": "A metal that is \"ductile\" can be pulled into this while cold & under pressure",
  "category": "SCIENCE"
} 

ID: 71df197f-a667-4b51-9e58-4e9c55531429
Data: {
  "answer": "the diamondback rattler",
  "question": "Heaviest of all poisonous snakes is this North American rattlesnake",
  "category": "ANIMALS"
} 



### Hybrid Search on a specific property

The `properties` parameter allows you to list the properties that you want bm25 to search on.

In [8]:
response = jeopardy.query.hybrid(
    query="northern beast",
    query_properties=["question"],
    alpha=0.8,
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

ID: 1e895b51-a59e-4de4-bdf9-f22ece9180d4
Data: {
  "answer": "species",
  "question": "2000 news: the Gunnison sage grouse isn't just another northern sage grouse, but a new one of this classification",
  "category": "SCIENCE"
} 

ID: 73fb3875-3783-4333-978f-7f2e3888aecf
Data: {
  "answer": "wire",
  "question": "A metal that is \"ductile\" can be pulled into this while cold & under pressure",
  "category": "SCIENCE"
} 

ID: 71df197f-a667-4b51-9e58-4e9c55531429
Data: {
  "answer": "the diamondback rattler",
  "question": "Heaviest of all poisonous snakes is this North American rattlesnake",
  "category": "ANIMALS"
} 



### Hybrid Search with a `where` filter

Find Jeopardy questions about elephants, where the category is set to Animals.

In [9]:
import weaviate.classes.query as wq # wq is an alias to save us from typing weaviate.classes everywhere ;)

response = jeopardy.query.hybrid(
    query="northern beast",
    alpha=0.8,
    filters=wq.Filter.by_property("category").equal("Animals"),
    limit=3
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

ID: 71df197f-a667-4b51-9e58-4e9c55531429
Data: {
  "answer": "the diamondback rattler",
  "question": "Heaviest of all poisonous snakes is this North American rattlesnake",
  "category": "ANIMALS"
} 

ID: bdc0d9de-f263-4faf-a9f7-eedd655123b4
Data: {
  "answer": "Elephant",
  "question": "It's the only living mammal in the order Proboseidea",
  "category": "ANIMALS"
} 

ID: d829d2e7-c4fd-4a1e-8607-ade70da7b816
Data: {
  "answer": "the nose or snout",
  "question": "The gavial looks very much like a crocodile except for this bodily feature",
  "category": "ANIMALS"
} 



### Hybrid Search with a custom vector

You can pass in your own vector as input into the hybrid query, by using the `vector` parameter. 

In [10]:
vector = [ 9.01219342e-03, -1.13035388e-01, -3.29367593e-02,  4.38332856e-02,
        7.23974332e-02, -1.97557714e-02, -3.86070684e-02,  3.95518802e-02,
        3.72668579e-02,  2.05339417e-02, -6.12595025e-03,  3.19878533e-02,
       -5.29028215e-02, -1.46579389e-02,  4.35932120e-03,  4.73472029e-02,
        3.08018792e-02, -3.56715880e-02, -1.00472206e-02, -3.10862083e-02,
       -3.91206481e-02, -3.53321470e-02,  4.36015874e-02,  4.53947634e-02,
        1.28318956e-02, -4.59509380e-02,  2.18371209e-02,  3.01626958e-02,
        7.22554475e-02,  1.06480159e-01, -2.11778935e-02, -3.67801264e-02,
        1.02296481e-02,  1.05294541e-01, -5.45431301e-02, -7.61895021e-03,
       -2.85231130e-04,  1.14363050e-02, -3.05932797e-02,  1.17350537e-02,
        1.50630940e-02,  4.32989746e-02,  4.74306457e-02, -1.48862014e-02,
        2.02855896e-02,  1.20545849e-02,  4.16181721e-02,  5.39400848e-03,
       -1.10965833e-01,  5.81688397e-02,  4.90737744e-02, -1.90856531e-02,
        3.16569172e-02, -6.54561520e-02,  3.80893052e-02, -6.77786544e-02,
        1.84782557e-02,  1.39753018e-02, -3.28312591e-02, -9.79156327e-03,
        2.21680366e-02,  1.28861135e-02, -2.91477833e-02, -7.52918539e-04,
       -1.17017012e-02,  4.94324379e-02, -4.73870113e-02, -2.67444123e-02,
       -1.75260485e-03, -4.84930091e-02, -5.17679378e-02, -3.40025723e-02,
       -6.74789995e-02,  1.84029415e-02, -2.96260361e-02,  1.10075716e-02,
       -3.00292019e-02, -6.29385114e-02,  1.00643866e-01,  6.60364656e-03,
        1.37663232e-02, -4.80012095e-04,  6.13467023e-02,  3.20448801e-02,
       -1.64051596e-02,  4.73705400e-03,  4.92518656e-02,  9.71979368e-03,
       -6.07053079e-02,  5.23600802e-02,  8.00200477e-02, -9.56894234e-02,
       -8.57020468e-02,  6.36228547e-02,  7.52996579e-02,  1.29473349e-02,
       -9.71227512e-03, -2.11577863e-03,  2.87621394e-02,  3.76317315e-02,
       -3.18394564e-02, -4.83599156e-02, -8.65049437e-02,  6.12095408e-02,
       -9.94464681e-02, -6.41045487e-03, -4.48330231e-02,  7.90148228e-03,
       -2.83656046e-02, -3.31363641e-02, -1.66364666e-02,  5.61142266e-02,
        1.13785220e-02,  1.10646505e-02, -3.59027926e-03,  3.18051614e-02,
       -5.28689176e-02,  5.93203604e-02,  2.32112482e-02, -5.48220472e-03,
       -6.93533896e-03, -8.94231722e-02, -6.08570725e-02, -3.95617411e-02,
        4.85893041e-02,  1.05655065e-03, -9.03137308e-03,  8.84599425e-03,
       -5.01697697e-02, -2.76381504e-02,  6.23455690e-03, -7.74473846e-02,
        1.57587640e-02,  5.15611134e-02, -1.44126983e-02,  6.92101475e-03,
       -6.13961443e-02,  4.40146513e-02,  4.75864671e-03,  3.50741483e-03,
        3.11125033e-02,  3.47304791e-02,  5.09980619e-02, -1.74094904e-02,
       -2.65628584e-02, -4.65470664e-02, -3.20187546e-02,  1.64081044e-02,
       -2.09542606e-02,  4.23371866e-02, -1.89093240e-02, -6.34234771e-02,
       -2.04049945e-02, -2.11013779e-02, -8.46841633e-02, -9.52623971e-03,
        1.97680127e-02, -2.40785163e-02,  2.49154642e-02,  4.18314859e-02,
       -2.28452627e-02,  4.47808802e-02,  4.76567507e-01, -7.80266151e-02,
       -4.52793390e-02,  3.06765549e-03,  8.28293487e-02,  3.23506519e-02,
       -3.14611495e-02, -3.96178849e-02,  6.32850453e-02, -2.49318080e-03,
        3.33857909e-02, -3.74386995e-03,  4.00588103e-02, -1.59429051e-02,
       -3.07114739e-02,  3.90322991e-02, -2.54357159e-02,  8.29387754e-02,
       -7.03540910e-03,  5.44182062e-02, -8.46608132e-02, -3.19642611e-02,
        2.15431047e-03, -2.26297304e-02, -4.73075360e-02, -1.15000702e-01,
        4.95740511e-02, -1.26169017e-02, -8.08310956e-02, -2.47572120e-02,
        1.57660078e-02, -1.80676319e-02,  2.68229395e-02,  4.06741686e-02,
        1.46878678e-02, -5.19402176e-02, -1.93753019e-02, -1.51190013e-02,
       -5.90167083e-02,  2.33134311e-02,  1.54988151e-02, -8.63076299e-02,
        1.27702672e-02, -2.04606038e-02,  3.27454098e-02, -1.40842535e-02,
        5.93160316e-02,  6.02267347e-02, -2.12665759e-02,  3.20592932e-02,
        2.36758012e-02, -8.07149634e-02,  4.60682325e-02, -3.45666669e-02,
        8.99648014e-03, -1.52559271e-02,  6.74400339e-03,  2.55527021e-03,
        1.46748535e-02, -3.77581120e-02,  7.16287736e-03, -1.03465198e-02,
       -3.56848203e-02,  3.87728252e-02, -4.70833294e-02,  2.48581879e-02,
        2.97012087e-02, -6.40219525e-02,  2.64278464e-02,  2.21447032e-02,
       -2.41054874e-02,  6.66826451e-03,  3.61728817e-02, -7.40753412e-02,
        2.12933328e-02, -3.87359522e-02, -1.98461842e-02, -1.16880587e-03,
       -3.42810266e-02,  1.45098567e-02,  7.91829154e-02, -1.88584365e-02,
       -2.33891439e-02, -1.11310491e-02, -2.28953492e-02, -2.83630341e-02,
       -2.05208883e-02, -8.44672509e-03, -2.05340213e-03,  6.90146759e-02,
        1.70953311e-02, -1.33336931e-01,  1.76023226e-03,  2.90952660e-02,
       -6.40432984e-02, -2.88128033e-02, -5.27145155e-03, -1.51323639e-02,
       -3.41428816e-02, -5.72668612e-02, -3.18827592e-02, -6.70430511e-02,
        4.40178998e-02,  3.27184126e-02, -3.17894705e-02,  2.31186040e-02,
        3.91944908e-02, -7.63402358e-02, -1.10105574e-02,  9.36873909e-03,
        8.59683678e-02, -9.23250616e-02, -2.59466730e-02, -1.92867629e-02,
        2.41955034e-02, -1.04991375e-02,  4.59894314e-02, -3.85227315e-02,
        1.20516932e-02, -3.13820764e-02, -2.22451817e-02,  7.85782859e-02,
        6.83768478e-04,  6.01733662e-02,  1.29658086e-02, -8.94022584e-02,
        9.16998163e-02,  1.03575382e-02, -4.05346528e-02,  1.75896157e-02,
       -4.07467829e-03, -2.26785690e-02,  2.67282384e-03,  5.83446994e-02,
       -1.10129595e-01, -1.44997276e-02,  2.29040012e-02,  2.45135818e-02,
        1.34665705e-02, -5.82843311e-02, -9.80154844e-04, -1.84601322e-02,
        7.33298957e-02,  3.93925756e-02, -3.15375924e-02,  6.80084080e-02,
       -2.14028009e-03,  3.50658665e-03, -7.03595877e-02, -1.74236409e-02,
       -3.83076048e-03, -9.49744731e-02, -7.69278184e-02, -8.98255333e-02,
        1.11942343e-01, -3.30329277e-02,  1.11849830e-01, -5.06280325e-02,
       -9.61470138e-03,  1.13748990e-01, -1.01661071e-01, -1.86292008e-02,
       -2.36333292e-02,  2.29119174e-02, -1.43332183e-02,  6.85163448e-03,
        2.39794012e-02,  2.62816511e-02,  5.42403758e-03, -1.86059009e-02,
        1.36639774e-02, -8.80998075e-02,  4.36533941e-03,  3.55729051e-02,
       -9.46146436e-03,  2.57336851e-02, -1.13036709e-04,  4.13415283e-02,
       -5.41722625e-02, -9.97888595e-02, -1.16476052e-01,  2.58566756e-02,
       -5.25019653e-02,  1.40418205e-03,  8.41934234e-02, -5.61945923e-02,
       -6.89583570e-02,  3.83320190e-02, -4.60164286e-02,  1.69613753e-02,
       -7.80761847e-03,  2.00636163e-02,  2.54295617e-02, -7.21545937e-03,
       -2.23655887e-02, -1.29519384e-02,  6.28145859e-02,  3.33599523e-02,
        6.06051134e-03, -2.29896121e-02, -2.09328812e-02, -2.11620773e-03,
        3.70613672e-02, -2.57191733e-02, -5.02929837e-02,  5.11370078e-02,
        5.26849516e-02, -3.07372268e-02,  3.77799384e-02, -4.37224880e-02,
       -1.79544296e-02,  6.92327768e-02,  6.18538633e-02,  1.72134619e-02,
        1.33291595e-02, -1.60525739e-02, -2.40784767e-03, -2.14485265e-02,
        7.03238323e-02, -5.41396327e-02,  2.09993366e-02,  1.09692970e-02]

response = jeopardy.query.hybrid(
    query="animal",
    vector=vector,
    limit=2
)

for item in response.objects:
    print("ID:", item.uuid)
    print("Data:", json.dumps(item.properties, indent=2), "\n")

ID: ee3b9fd9-92ef-4122-ad1a-375cbcaccf54
Data: {
  "answer": "Antelope",
  "question": "Weighing around a ton, the eland is the largest species of this animal in Africa",
  "category": "ANIMALS"
} 

ID: bdc0d9de-f263-4faf-a9f7-eedd655123b4
Data: {
  "answer": "Elephant",
  "question": "It's the only living mammal in the order Proboseidea",
  "category": "ANIMALS"
} 

