## Create an Index

**PreciseSearch** supports two types of indexes: Dense Indexes and Hybrid Indexes. Dense indexes search over dense vector representations of the data, where as hybrid indexes search over a combination of dense and sparse representations of the data.

In this quickstart, we will create a dense index with an integrated embedding model (hosted by VectorStackAI). With dense indexes configured with an integrated embedding model, during upsert and search, you only need to provide the text data. PreciseSearch will generate the dense vector representations of the data automatically.

In [8]:
from vectorstackai import Client
import time

# Initialize the client with your API key
client = Client(api_key="your_api_key")

# Create the index with an integrated embedding model (e.g., 'e5-small-v2')
client.create_index(index_name="my_dense_index", embedding_model_name="e5-small-v2")

# Wait for the index to be ready
while client.get_index_info(index_name="my_dense_index")['status'] != "ready":
    time.sleep(2)
    print("Index is not ready yet. Waiting for 2 seconds...")

# Connect to the index
index = client.connect_to_index(index_name="my_dense_index")

Request accepted: Index creation for 'my_dense_index' started.
Index is not ready yet. Waiting for 2 seconds...


## Upsert Data into the Index

To upsert data into the index, you need to provide the text data and assign a unique ID to each data point.

For this example, we will create a dataset with 10 random facts about food and history.

In [9]:
dataset = [
    {"id": "1", "text": "The shortest war in history was between Britain and Zanzibar on August 27, 1896. Zanzibar surrendered after just 38 minutes."},
    {"id": "2", "text": "Ancient Romans used bread as plates. After the meal, these edible plates were either eaten or given to the poor."},
    {"id": "3", "text": "The first chocolate bar was made in England by Fry's in 1847, marking the beginning of the modern chocolate industry."},
    {"id": "4", "text": "The Battle of Hastings in 1066 changed English history forever when William the Conqueror defeated King Harold II."},
    {"id": "5", "text": "The Great Wall of China took over 2000 years to build, with construction starting in the 7th century BCE."},
    {"id": "6", "text": "Ketchup was sold as medicine in the 1830s to treat diarrhea, indigestion, and other stomach problems."},
    {"id": "7", "text": "Pizza was invented in Naples, Italy in the late 1700s. The classic Margherita pizza was created in 1889."},
    {"id": "8", "text": "The first Thanksgiving feast in 1621 lasted for three days and included deer, fish, and wild fowl."},
    {"id": "9", "text": "The signing of the Magna Carta in 1215 limited the power of English monarchs and influenced modern democracy."},
    {"id": "10", "text": "During World War II, carrots were promoted by the British as helping pilots see better at night to hide radar technology."}
]

# Parse the ids and texts for batch upsert
batch_ids = [item['id'] for item in dataset] 
batch_metadata = [{'text': item['text']} for item in dataset] 

# Upsert the data into the index
index.upsert(batch_ids=batch_ids, 
             batch_metadata=batch_metadata)

# Once the upsert is complete, you can check the number of vectors in the index via the index info
index_info = index.info()
for key, value in index_info.items():
    print(f"{key}: {value}")

index_name: my_dense_index
num_records: 10
dimension: 384
metric: dotproduct
features_type: dense
status: ready
embedding_model_name: e5-small-v2
optimized_for_latency: False


## Search the index
Now that we have our index ready, we can search it.
In this example, we will search the index for the query "Where was pizza invented?".

Since our index is configured with an integrated embedding model, we only need to provide the query text.

In [None]:
# Search the index
search_results = index.search(query_text="Where was pizza invented?", 
                       top_k=5)

# Print the results
for result in search_results:
    print(f"ID: {result['id']}, Similarity: {result['similarity']:.2f}, Text: {result['metadata']['text']}")

ID: 7, Similarity: 0.92, Text: Pizza was invented in Naples, Italy in the late 1700s. The classic Margherita pizza was created in 1889.
ID: 2, Similarity: 0.80, Text: Ancient Romans used bread as plates. After the meal, these edible plates were either eaten or given to the poor.
ID: 3, Similarity: 0.80, Text: The first chocolate bar was made in England by Fry's in 1847, marking the beginning of the modern chocolate industry.
ID: 6, Similarity: 0.78, Text: Ketchup was sold as medicine in the 1830s to treat diarrhea, indigestion, and other stomach problems.
ID: 10, Similarity: 0.77, Text: During World War II, carrots were promoted by the British as helping pilots see better at night to hide radar technology.


Notice that the results (shown above) are sorted in descending order of similarity, and contain results relevant to the query.

## Clean up
Once you are done with the quickstart, you can delete the index.

In [11]:
index.delete(ask_for_confirmation=False)