![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# Use watsonx, watsonx.data, LangChain, and vector indexes to chat with a document (RAG)

#### Disclaimers

- Use only Projects and Spaces that are available in watsonx context.


## Notebook content

This notebook demonstrates how to reproduce the behaviour of chat with a document and vector indexes programmatically through watsonx APIs and clients.

Some familiarity with Python is helpful. This notebook uses Python 3.11.

## Learning goal

The purpose of this notebook is to replicate chat with a documents behavior programmatically by integrating document and vector indexes with watsonx APIs and watsonx.data Milvus.


## Contents

This notebook contains the following parts:

- [Setup](#setup)
- [Build Vector Index](#build)
  * [Define vector index properties](#define)
  * [Initialize vector store](#initialize)
  * [Data Asset Processing](#data_asset_processing)
  * [Create embeddings](#embeddings)
  * [Create vector index asset](#vector-index)  
- [Deploy chat with document AI Service](#deploy)
  * [Define AI Service function](#ai-service-function-definition)
  * [Test AI Service locally](#test-ai-service-locally)
  * [Create deployment](#create-deployment)
- [Summary and next steps](#summary)

<a id="setup"></a>
## Set up the environment

Before you use the sample code in this notebook, you must perform the following setup tasks:

-  Contact with your Cloud Pak for Data administrator and ask them for your account credentials

### Install required packages

Install the required libraries for this notebook, ensuring that in an airgap environment the local PyPI repository is populated with these dependencies.

In [None]:
!pip install jq | tail -n 1
!pip install docx2txt | tail -n 1
!pip install tiktoken | tail -n 1
!pip install python-pptx | tail -n 1
!pip install unstructured | tail -n 1
!pip install "ibm-watsonx-ai>=1.2.5" | tail -n 1

### Import required packages

In [1]:
from ibm_watsonx_ai import Credentials, APIClient
from ibm_watsonx_ai.foundation_models import Embeddings
from ibm_watsonx_ai.foundation_models.extensions.rag.chunker import LangChainChunker
from ibm_watsonx_ai.foundation_models.extensions.rag.vector_stores import VectorStore

from langchain_community.document_loaders import TextLoader
from langchain.document_loaders import (
  PyPDFLoader,
  CSVLoader,
  Docx2txtLoader,
  TextLoader,
  UnstructuredExcelLoader,
  UnstructuredPowerPointLoader,
  UnstructuredMarkdownLoader,
  JSONLoader,
  UnstructuredHTMLLoader
)

### Connection to WML

Authenticate the Watson Machine Learning service on IBM Cloud Pak for Data. By default, `url`, `username` and `token` will be taken from the set of environment variables. You can overwrite them by providing `url`, `username` and `api_key` as arguments.

Initialize the client to work with the watsonx API.

In [2]:
from ibm_watsonx_ai import APIClient, Credentials

credentials = Credentials(
    instance_id="openshift",
    version="5.1"
)

client = APIClient(credentials)

Alternatively you can use `username` and `password` to authenticate WML services.

```python
credentials = Credentials(
    username=***,
    password=***,
    url=***,
    instance_id="openshift",
    version="5.1"
)

```

### Working with projects

First of all, you need to create a project that will be used for your work. If you do not have project already created follow below steps.

- Open IBM Cloud Pak main page
- Click all projects
- Create an empty project
- Copy `project_id` from url and paste it below

**Action**: Assign project ID below

In [3]:
import os

try:
    project_id = os.environ["PROJECT_ID"]
except KeyError:
    project_id = input("Please enter your project_id (hit enter): ")

To be able to interact with all resources available in WML services, you need to set the **project** which you will be using.

In [4]:
client.set.default_project(project_id)

'SUCCESS'

<a id="build"></a>

## Build Vector Index
The following code demonstrates how to generate embeddings from a list of data assets within a Milvus collection and save them as a vector index asset.

<a id="define"></a>

### Define vector index

This section allows you to configure the properties for creating the vector index.

Retrieve a comprehensive list of data assets and connections to facilitate the definition of the vector index properties.

In [5]:
print("Data assets:")
print(client.data_assets.list().to_string())

print("\n\nConnections:")
print(client.connections.list().to_string())

Data assets:
                 NAME  ASSET_TYPE   SIZE                              ASSET_ID
0  ModelInference.txt  data_asset  13584  56a9251d-419d-44f4-9676-55137166ba1f


Connections:
                NAME                                    ID               CREATED                    DATASOURCE_TYPE_ID
0  Milvus Connection  c03acccb-03be-4c01-98c3-6b07a53dc043  2025-03-14T07:59:41Z  4484a69b-2c6d-4cec-bfed-fb93332a820b


Specify the data assets to be incorporated into the vector index.

In [None]:
data_assets = [
    "ENTER YOUR DATA ASSETS ID HERE" 
]

Configure the properties of the watsonx.data Milvus instance and collection by specifying the details of the connection.

In [7]:
connection_id = "ENTER YOUR CONNECTION ID HERE"
collection = "wx_collection"
database = "default"

Configure the vector index settings.

In [None]:
# ID of the embedding model
embedding_model_id = "ibm/granite-embedding-278m-multilingual"

# Additional vector index settings
vector_index_name = "My vector index"
top_k = 5
rerank = False
chunk_size = 2000
chunk_overlap = 200
split_pdf_pages = True

Utilizing the settings defined above, we construct the payload for the vector index, consolidating all configuration parameters into a structured format for subsequent processing.

In [9]:
vector_index_details = {
    "name": vector_index_name,
	"data_assets": data_assets,
	"store": {
		"type": "watsonx.data",
		"connection_id": connection_id,
		"index": collection,
		"database": database
	},
	"settings": {
		"chunk_size": chunk_size,
		"chunk_overlap": chunk_overlap,
		"split_pdf_pages": split_pdf_pages,
		"top_k": top_k,
		"rerank": rerank,
		"embedding_model_id": embedding_model_id,
		"schema_fields": {
			"document_name": "document_name",
			"text": "text",
			"page_number": "page"
		}
	},
	"status": "ready"
}

# The following schema delineates the documents being created within the Milvus collection.
vector_store_schema = vector_index_details["settings"]["schema_fields"]
text_field = vector_store_schema.get("text")

<a id="initialize"></a>

### Initialize vector store

Instantiate the `VectorStore` class for the watsonx.data Milvus collection. By default, `VectorStore` will create a collection with the nama as is stated in `collection` variable. If collection with that name already exist, no new collection is created and running method `VectorStore.add_documents` the new documents will be added to the existing collection. To drop old collection and create a new one with the same name, please set `drop_old` param to `True` in `VectorStore` constructor.

In [None]:
embeddings = Embeddings(
    model_id=embedding_model_id,
	api_client=client,
    params={
        "truncate_input_tokens": 512
    }
)

vector_store = VectorStore(
    api_client=client,
    connection_id=connection_id,
    embeddings=embeddings,
    index_name=collection,
    database=database,
    consistency_level='Strong',
    connection_args={'secure': True},
    text_field=text_field
)

<a id="data_asset_processing"></a>

### Data Asset Processing

The following cells manage the parsing of data assets, their conversion into LangChain documents, and the generation of embeddings in the Milvus collection.

Define the document loader to be used for each file type

In [11]:
mime_type_mappings = {
    'text/plain': TextLoader,
    'application/pdf': PyPDFLoader,
    'text/csv': CSVLoader,
    'application/vnd.openxmlformats-officedocument.wordprocessingml.document': Docx2txtLoader,
    'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': UnstructuredExcelLoader,
    'application/vnd.openxmlformats-officedocument.presentationml.presentation': UnstructuredPowerPointLoader,
    'text/markdown': UnstructuredMarkdownLoader,
    'application/json': JSONLoader,
    'text/html': UnstructuredHTMLLoader
}

Initialize the text splitter used to partition documents into manageable chunks.

In [12]:
text_splitter = LangChainChunker(
    method="recursive",
    chunk_size=chunk_size,
    chunk_overlap=chunk_overlap
)

Define a function to parse the raw content of a data asset.

In [13]:
def load_document(data_asset: dict) -> list:
    """
    Load and split the document from a data asset.

    Args:
        data_asset (dict): A dictionary containing metadata and entity details of the data asset.

    Returns:
        list: The loaded and split content of the document.
    """
    asset_id = data_asset["metadata"]["asset_id"]
    filename = data_asset["metadata"]["name"]
    file_path = client.data_assets.download(asset_id, filename)
    mime_type = data_asset["entity"]["data_asset"]["mime_type"]
    
    # Get the correct loader for the MIME type.
    if mime_type == "application/json":
        loader = mime_type_mappings[mime_type](filename, jq_schema='.', text_content=False)
    else:
        loader = mime_type_mappings[mime_type](file_path)
    return loader.load_and_split()

Define a function to incorporate additional metadata into a LangChain document, such as the document name and an optional page number.

In [14]:
def compute_documents_metadata(document_name: str, loaded_documents: list) -> list:
    """
    Compute and update metadata for a list of loaded documents.

    Args:
        document_name (str): The name to be assigned to each document.
        loaded_documents (list): A list of documents from which metadata is extracted and updated.

    Returns:
        list: A list of documents with enriched metadata.
    """
    filtered_documents = []
    for document in loaded_documents:
        computed_document_data = {
            "metadata": {
                "source": document.model_dump()["metadata"]["source"]
            }
        }
        computed_document_data["metadata"][vector_store_schema.get("page_number")] = document.model_dump()["metadata"].get("page", 0)
        computed_document_data["metadata"][vector_store_schema.get("document_name")] = document_name
        filtered_documents.append(document.model_copy(update=computed_document_data))
        
    return filtered_documents

Define a function to extract and segment document chunks from an individual data asset.

In [15]:
def process_document(data_asset: dict) -> list:
    """
    Process a single data asset by loading, enriching, and splitting its content into document chunks.

    Args:
        data_asset (dict): A dictionary representing the data asset, which contains metadata and other details.

    Returns:
        list: A list of document chunks obtained after splitting the enriched document.
    """
    print("Processing", data_asset["metadata"]["name"])
    loaded_documents = load_document(data_asset)
    filtered_documents = compute_documents_metadata(data_asset["metadata"]["name"], loaded_documents)

    return text_splitter.split_documents(filtered_documents)

Define a function that retrieves document chunks for all available data assets.

In [16]:
def process_documents(data_asset_ids: list) -> list:
    """
    Process and retrieve documents from the given list of data asset IDs.

    Parameters:
        data_asset_ids (list): A list of data asset identifiers to be processed.

    Returns:
        list: A list of processed documents aggregated from the provided data assets.
    """
    documents = []
    for data_asset_id in data_asset_ids:
        data_asset = client.data_assets.get_details(data_asset_id)
        
        if data_asset["metadata"]["asset_type"] == "data_asset":
            document = process_document(data_asset)
            documents += document

    return documents

Process the data assets.

In [17]:
documents = process_documents(data_assets)

Processing ModelInference.txt
Successfully saved data asset content to file: 'ModelInference.txt'


<a id="embeddings"></a>

### Create embeddings

Generate embeddings for the document chunks and add them into the Milvus collection.

In [18]:
vector_store.add_documents(content=documents, batch_size=200)

['cfa02bfb834b9c7bfca4d98924e6ebe92f28eaddb702edadced88b9d67ee494a',
 '2e4d36157aec823e169750b3ae45bdccbff6df4f370b36862b22377a76d5cbe9',
 '6feed52fc3d6cef08f9d05d9777f3fec0a7966a2005da04cb6ae7167108e5ac5',
 '26d08eb4cb54eed7e9af2a3cf30fa0c2c9e84364f5e504134245e2678b37cd60',
 '222c93a51ca5aecf662c803c8181e337e4d1249d1d4976a2660baaef30d2e4af',
 '6d65582150b5f3080b4afa369d80a8c30984a665f7cca2de75899cb759705f73',
 '4b487731188fa2803dba310b350c1c79ac4067a106769b760cadbbc4fab367a3',
 '74e6dd153be95b2db6e021b71a108f573d702b3e3a22dc91adb3cd39dcf34248']

<a id="vector-index"></a>

### Create vector index

Invoke the vector indexes API to create a vector index asset that points to this Milvus collection.

*Note*: This vector index does not require patching when updates are made to the collection.

In [19]:
import requests

vector_index_api_url = f'{credentials.url}/wx/v1/vector_indexes?project_id={project_id}'

response = requests.post(
    vector_index_api_url, 
    headers=client._get_headers(), 
    json=vector_index_details
)

vector_index = response.json()
print(vector_index)

{'id': '3ebf3222-8a3b-4d86-9a8a-1c2f37370adb', 'name': 'My vector index', 'created_at': 1742205546906, 'created_by': '1000331001', 'last_updated_at': 1742205546906, 'last_updated_by': '1000331001', 'data_assets': ['56a9251d-419d-44f4-9676-55137166ba1f'], 'store': {'type': 'watsonx.data', 'connection_id': 'c03acccb-03be-4c01-98c3-6b07a53dc043', 'index': 'wx_collection', 'database': 'default'}, 'settings': {'chunk_size': 2000, 'chunk_overlap': 200, 'split_pdf_pages': True, 'top_k': 5, 'rerank': False, 'embedding_model_id': 'ibm/granite-embedding-278m-multilingual', 'schema_fields': {'document_name': 'document_name', 'text': 'text', 'page_number': 'page'}}, 'status': 'ready'}


<a id="deploy"></a>

## Deploy AI Service

Below is a step-by-step example demonstrating how to deploy a prompt that interacts with a document using vector indexes.

*Note*: This feature is available only for models supported by the Chat AP

You can use the `list` method to display all existing spaces.

In [None]:
client.spaces.list()

Define the space for deploying the AI service.

In [20]:
space_id = 'ENTER YOUR SPACE ID HERE'

client.set.default_space(space_id)

Unsetting the project_id ...


'SUCCESS'

Promote the vector index to the designated space.

In [21]:
vector_index_id = client.spaces.promote(vector_index.get("id"), project_id, space_id)

<a id="ai-service-function-definition"></a>

### AI Service function definition
Define the AI service function to handle data retrieval and inference.

In [None]:
params = {
    "space_id": space_id, 
    "vector_index_id": vector_index_id,
    "url": credentials.url,
}

def gen_ai_service(context, params=params, **custom):
    # import dependencies
    from ibm_watsonx_ai.client import APIClient, Credentials
    from ibm_watsonx_ai.foundation_models.extensions.rag import Retriever
    from ibm_watsonx_ai.foundation_models.extensions.rag.vector_stores import VectorStore
    from ibm_watsonx_ai.foundation_models import ModelInference, Embeddings, Rerank

    vector_index_id = params.get("vector_index_id")
    space_id = params.get("space_id")
    url = params.get("url")

    # Inference details
    inference_model_id = params.get("inference_model_id")
    system_prompt = "You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is correct given the context and user query, and that it is grounded in the context. Furthermore, make sure that the response is supported by the given document or context. Always make sure that your response is relevant to the question. If an explanation is needed, first provide the explanation or reasoning, and then give the final answer. Avoid repeating information unless asked."
    inference_params = {
        "max_tokens": 2000,
        "temperature": 0
    }
    
    # Setup client
    credentials = Credentials(
        url=url,
        token=context.generate_token(),
        instance_id="openshift"
    )

    client = APIClient(credentials, space_id=space_id)

    # Get vector index details
    vector_index_details = client.data_assets.get_details(vector_index_id)
    vector_index_properties = vector_index_details["entity"]["vector_index"]

    def rerank(inner_client, documents, query, top_n):
        """
        Rerank a list of documents based on a query using a cross-encoder model.

        Parameters:
            inner_client: An API client instance used to interact with the underlying service.
            documents (list): A list of documents to be reranked.
            query (str): The query string used to evaluate the relevance of each document.
            top_n (int): The number of top documents to return after reranking.

        Returns:
            list: A new list of documents ordered by their relevance to the query.
        """
        reranker = Rerank(
            model_id="cross-encoder/ms-marco-minilm-l-12-v2",
            api_client=inner_client,
            params={
                "return_options": {
                    "top_n": top_n
                },
                "truncate_input_tokens": 512
            }
        )

        reranked_results = reranker.generate(query=query, inputs=documents)["results"]

        new_documents = [documents[result["index"]] for result in reranked_results]
            
        return new_documents        

    def format_messages(messages, documents, system_prompt):
        """
        Format conversation messages by appending contextual information and prepending a system prompt.

        Parameters:
            messages (list): A list of message dictionaries, where each dictionary must include a "content" key.
            documents (list): A list of document strings that will be combined to form context.
            system_prompt (str): The system prompt to be inserted as the first message in the conversation.

        Returns:
            list: The updated list of messages including the reformatted last message and the prepended system message.
        """
        context = "\n".join(documents)

        # Append context to the last message.
        if messages:
            content = messages[-1].get("content", "")
            # Format of this string may be model dependent
            messages[-1]["content"] = (
                f"Use the following pieces of context to answer the question.\n\n"
                f"{context}\n\n"
                f"Question: {content}\n"
            )
        
        # Prepend the system prompt.
        messages.insert(0, {"role": "system", "content": system_prompt})
        
        return messages

    def inference_model(inference_model_id, inner_client, messages, stream):
        """
        Retrieve document chunks, incorporate contextual information, and generate a grounded response.

        Parameters:
            inference_model_id: ID of model that will be used for inferencing
            inner_client: An API client instance used for connecting to the vector store and the inference model.
            messages (list): A list of message dictionaries representing the conversation history. The content of
                            the last message is used as the query for document retrieval.
            stream (bool): If True, the inference model returns a streaming response; otherwise, it returns a complete response.

        Returns:
            The generated response from the inference model, either as a stream or as a complete message.
        """
        emb = Embeddings(
            model_id=vector_index_properties["settings"]["embedding_model_id"],
            api_client=inner_client,
            params={
                "truncate_input_tokens": 512
            }
        )

        top_n = 20 if vector_index_properties["settings"].get("rerank") else int(vector_index_properties["settings"]["top_k"])

        vector_store = VectorStore(
            client=inner_client,
            connection_id=vector_index_properties["store"]["connection_id"],
            embeddings=emb,
            index_name=vector_index_properties["store"]["index"],
            database=vector_index_properties["store"]["database"],
            consistency_level='Strong',
            connection_args={'secure': True},
            text_field=vector_index_properties["settings"]["schema_fields"]["text"],
            search_params={"ef": 2 * top_n} # `ef` param needs to be larger than `top_n` param
        )

        # Retrieve document chunks from the vector index
        query = messages[-1].get("content")

        retriever = Retriever(vector_store=vector_store, number_of_chunks=top_n)
        documents = retriever.retrieve(query)

        def get_doc_content(doc):
            """
            Extract the content from a document.

            Parameters:
                doc: A document object with a 'page_content' attribute.

            Returns:
                str: The textual content of the document.
            """
            return doc.page_content

        document_contents = list(map(get_doc_content, documents))

        # Use reranking if enabled
        if vector_index_properties["settings"].get("rerank"):
            document_contents = rerank(inner_client, document_contents, query, vector_index_properties["settings"]["top_k"])

        # Generate grounded response using the inference details
        messages = format_messages(messages, document_contents, system_prompt=system_prompt)

        model = ModelInference(
            model_id=inference_model_id,
            params=inference_params,
            api_client=inner_client,
            space_id=space_id
        )

        if stream == True:
            generated_response = model.chat_stream(messages=messages)
        else:
            generated_response = model.chat(messages=messages)

        return generated_response

    def get_inner_client(context):
        """
        Set up and return an inner API client using the provided context.

        Parameters:
            context: An object used in AI services deployment runtime

        Returns:
            APIClient: An instance of APIClient configured with the constructed credentials and space ID.
        """
        inner_credentials = Credentials(
            url = url,
            token = context.get_token(),
            instance_id = "openshift"
        )
        inner_client = APIClient(inner_credentials, space_id = space_id)
        return inner_client

    def generate(context):
        payload = context.get_json()
        messages = payload.get("messages")
        inference_model_id = payload.get("model_id")
        inner_client = get_inner_client(context)    
        
        results = inference_model(inference_model_id, inner_client, messages, False)
        
        response = {
            "headers": {
                "Content-Type": "application/json"
            },
            "body": results
        }

        return response

    def generate_stream(context):
        payload = context.get_json()
        messages = payload.get("messages")
        inference_model_id = payload.get("model_id")
        inner_client = get_inner_client(context)
        response_stream = inference_model(inference_model_id, inner_client, messages, True)

        for chunk in response_stream:
            yield chunk

    return generate, generate_stream


Define the request and response schemas for the AI service.

In [None]:
request_schema = {
    "application/json": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            "model_id": {
                "title": "The model to use for the inference.",
                "type": "str"
            },
            "messages": {
                "title": "The messages for this chat session.",
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "role": {
                            "title": "The role of the message author.",
                            "type": "string",
                            "enum": ["user","assistant"]
                        },
                        "content": {
                            "title": "The contents of the message.",
                            "type": "string"
                        }
                    },
                    "required": ["role","content"]
                }
            }
        },
        "required": ["model_id", "messages"]
    }
}

response_schema = {
    "application/json": {
        "oneOf": [
            {
                "$schema": "http://json-schema.org/draft-07/schema#",
                "type": "object",
                "description": "AI Service response for /ai_service_stream",
                "properties": {
                    "choices": {
                        "description": "A list of chat completion choices.",
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "index": {
                                    "type": "integer",
                                    "title": "The index of this result."
                                },
                                "delta": {
                                    "description": "A message result.",
                                    "type": "object",
                                    "properties": {
                                        "content": {
                                            "description": "The contents of the message.",
                                            "type": "string"
                                        },
                                        "role": {
                                            "description": "The role of the author of this message.",
                                            "type": "string"
                                        }
                                    },
                                    "required": [
                                        "role"
                                    ]
                                }
                            }
                        }
                    }
                },
                "required": [
                    "choices"
                ]
            },
            {
                "$schema": "http://json-schema.org/draft-07/schema#",
                "type": "object",
                "description": "AI Service response for /ai_service",
                "properties": {
                    "choices": {
                        "description": "A list of chat completion choices",
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "index": {
                                    "type": "integer",
                                    "description": "The index of this result."
                                },
                                "message": {
                                    "description": "A message result.",
                                    "type": "object",
                                    "properties": {
                                        "role": {
                                            "description": "The role of the author of this message.",
                                            "type": "string"
                                        },
                                        "content": {
                                            "title": "Message content.",
                                            "type": "string"
                                        }
                                    },
                                    "required": [
                                        "role"
                                    ]
                                }
                            }
                        }
                    }
                },
                "required": [
                    "choices"
                ]
            }
        ]
    }
}

<a id="test-ai-service-locally"></a>

### Test AI Service locally
Before creating deployment, we can test locally the prepared AI service to detect potential errors at an early stage.

In [24]:
from ibm_watsonx_ai.deployments import RuntimeContext

context = RuntimeContext(api_client=client)

streaming = False
findex = 1 if streaming else 0
local_function = gen_ai_service(context)[findex]
messages = []

In [None]:
inference_model_id = "ibm/granite-3-8b-instruct"

local_question = "Summarize the document"

messages.append({ "role" : "user", "content": local_question })

context = RuntimeContext(api_client=client, request_payload_json={"model_id": inference_model_id, "messages": messages})

response = local_function(context)

if streaming:
    for chunk in response:
        print(chunk, end="\n\n", flush=True)
else:
    print(response)



<a id="create-deployment"></a>

### Create deployment
After making sure that AI service works as expected, we can proceed to the deployment creation step. 

Retrieve the software specification used by the AI service.

In [26]:
software_spec_id = client.software_specifications.get_id_by_name("runtime-24.1-py3.11")

Create the AI service asset.

In [27]:
ai_service_metadata = {
    client.repository.AIServiceMetaNames.NAME: vector_index_name,
    client.repository.AIServiceMetaNames.DESCRIPTION: "",
    client.repository.AIServiceMetaNames.SOFTWARE_SPEC_ID: software_spec_id,
    client.repository.AIServiceMetaNames.CUSTOM: {},
    client.repository.AIServiceMetaNames.REQUEST_DOCUMENTATION: request_schema,
    client.repository.AIServiceMetaNames.RESPONSE_DOCUMENTATION: response_schema
}

ai_service_details = client.repository.store_ai_service(meta_props=ai_service_metadata, ai_service=gen_ai_service)
ai_service_id = client.repository.get_ai_service_id(ai_service_details)

Deploy the AI service.

In [28]:
deployment_metadata = {
    client.deployments.ConfigurationMetaNames.NAME: vector_index_name,
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.CUSTOM: {},
    client.deployments.ConfigurationMetaNames.DESCRIPTION: f"{vector_index_name} description"
}

function_deployment_details = client.deployments.create(ai_service_id, meta_props=deployment_metadata, space_id=space_id)
deployment_id = client.deployments.get_id(function_deployment_details)



######################################################################################

Synchronous deployment creation for id: '62a04437-70f2-459a-b52b-8d6325765d99' started

######################################################################################


initializing
Note: online_url is deprecated and will be removed in a future release. Use serving_urls instead.
.......
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='0b15cf4e-47f5-4541-aa18-60409e43b0bd'
-----------------------------------------------------------------------------------------------




Evaluate the deployment of the AI service.

In [None]:
inference_model_id = "ibm/granite-3-8b-instruct"

remote_question = "Summarize the document"
payload = {"model_id": inference_model_id, "messages": [{"role": "user", "content": remote_question}]}

result = client.deployments.run_ai_service(deployment_id, payload)

if "error" in result:
    print(result["error"])
else:
    print(result)



In [None]:
stream_results = client.deployments.run_ai_service_stream(deployment_id, payload)

for chunk in stream_results:
    print(chunk)

<a id="summary"></a>
## Summary and next steps

You successfully completed this notebook!

You learned how to reproduce the behaviour of chat with a document and vector indexes programmatically through watsonx APIs and clients.
 
Check out our _<a href="https://ibm.github.io/watsonx-ai-python-sdk/samples.html" target="_blank" rel="noopener no referrer">Online Documentation</a>_ for more samples, tutorials, documentation, how-tos, and blog posts. 

Copyright © 2025 IBM. This notebook and its source code are released under the terms of the MIT License.