![image](https://raw.githubusercontent.com/IBM/watson-machine-learning-samples/master/cloud/notebooks/headers/watsonx-Prompt_Lab-Notebook.png)
# AI Service Deployment Notebook
This notebook contains steps and code to test, promote, and deploy an AI Service
capturing logic to implement RAG pattern for grounded chats.

**Note:** Notebook code generated using Prompt Lab will execute successfully.
If code is modified or reordered, there is no guarantee it will successfully execute.
For details, see: <a href="/docs/content/wsj/analyze-data/fm-prompt-save.html?context=wx" target="_blank">Saving your work in Prompt Lab as a notebook.</a>


Some familiarity with Python is helpful. This notebook uses Python 3.11.

## Contents
This notebook contains the following parts:

1. Setup
2. Initialize all the variables needed by the AI Service
3. Define the AI service function
4. Deploy an AI Service
5. Test the deployed AI Service

## 1. Set up the environment

Before you can run this notebook, you must perform the following setup tasks:

### Connection to WML
This cell defines the credentials required to work with watsonx API for both the execution in the project, 
as well as the deployment and runtime execution of the function.

**Action:** Provide the IBM Cloud personal API key. For details, see
<a href="https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui" target="_blank">documentation</a>.


In [1]:
import os
from ibm_watsonx_ai import APIClient, Credentials
import getpass

credentials = Credentials(
    url="https://au-syd.ml.cloud.ibm.com",
    api_key=getpass.getpass("Please enter your api key (hit enter): ")
)



Please enter your api key (hit enter):  ········


In [2]:
client = APIClient(credentials)

### Connecting to a space
A space will be be used to host the promoted AI Service.


In [3]:
space_id = "147fa72a-7bdf-44d4-8d4f-3d5639957c01"
client.set.default_space(space_id)


'SUCCESS'

### Promote asset(s) to space
We will now promote assets we will need to stage in the space so that we can access their data from the AI service.


In [4]:
source_project_id = "caf544a2-807e-465f-b275-81076b99a38a"
vector_index_id = client.spaces.promote("847e4983-4868-4823-a1db-d02b42f1d706", source_project_id, space_id)
print(vector_index_id)


297ad839-3386-4acf-8f0a-63ab889a3bba


## 2. Create the AI service function
We first need to define the AI service function

### 2.1 Define the function

In [23]:
params = {
    "space_id": space_id, 
    "vector_index_id": vector_index_id
}

def gen_ai_service(context, params = params, **custom):
    # import dependencies
    import json
    from ibm_watsonx_ai.foundation_models import ModelInference
    from ibm_watsonx_ai.gateway import Gateway
    from ibm_watsonx_ai.foundation_models.utils import Tool, Toolkit
    from ibm_watsonx_ai import APIClient, Credentials
    import os
    import requests
    import re

    space_id = params.get("space_id")
    vector_index_id = params.get("vector_index_id")

    def proximity_search( query, api_client ):
        document_search_tool = Toolkit(
            api_client=api_client
        ).get_tool("RAGQuery")


        config = {
        "vectorIndexId": vector_index_id,
        "spaceId": space_id
        }

    def get_api_client(context):
        credentials = Credentials(
            url="https://au-syd.ml.cloud.ibm.com",
            token=context.get_token()
        )

        api_client = APIClient(
            credentials = credentials,
            space_id = space_id
        )

        return api_client

    def text_detection(context, text, detectors):
        if (not text):
            return []
        body = {
            "detectors": detectors,
            "input": text,
            "space_id": space_id
        }
    
        query_params = {
            "version": "2023-05-23"
        }
    
        headers  = {
            "Accept": "application/json",
            "Content-Type": "application/json",
            "Authorization": f'Bearer {context.get_token()}'
        }
        
        detection_url = "https://private.au-syd.ml.cloud.ibm.com"
        
        detection_response = requests.post(f'{detection_url}/ml/v1/text/detection', headers = headers, json = body, params = query_params)
        
        if (detection_response.status_code > 400):
            raise Exception(f'Error doing text detection: {detection_response.json()}' )
        
        return detection_response.json().get("detections")
    
    def moderate_stream(response_stream):
        regex = r'^[^?.!\n].*[?.!\n]$'
    
        sentence = ""
    
        for chunk in response_stream:
            if (len(chunk["choices"])):
                sentence = f'{sentence}{chunk["choices"][0]["delta"]["content"]}'
                if (not bool(re.match(regex, sentence))):
                    continue
                    
            detectors = {
                "hap": {
                    "enabled": True,
                    "threshold": 0.5
                }
            }
    
            detections = text_detection(context, sentence, detectors)
    
            if (len(detections)):
                for detection in detections:
                    if (detection["detection_type"] == "pii"):
                        sentence = sentence.replace(detection["text"], "[Possibly personal information removed]")
                    elif (detection["detection_type"] == "hap"):
                        sentence = sentence.replace(detection["text"], "[Potentially harmful text removed]")
            
            chunk_response = {
                "choices": [{
                    "index": 0,
                    "delta": {
                        "role": "assistant",
                        "content": sentence
                    }
                    
                }]
            }
    
            yield chunk_response
            sentence = ""
        
    def moderation_input(mask):
        return {
            "choices": [{
                "index": 0,
                "message": {
                "role": "assistant",
                "content": mask
                }
            }]
        }
    
    def moderation_input_stream(mask):
        yield {
            "choices": [{
                "index": 0,
                "delta": {
                    "role": "assistant",
                    "content": mask
                }
                
            }]
        }
    
    def get_moderation_input_mask(detections):
        mask = ""
        if (detections[0]["detection_type"] == "pii"):
            mask = "[The input was rejected for containing personal information]."
        elif (detections[0]["detection_type"] == "hap"):
            mask = "[The input was rejected as inappropriate]."
        elif (detections[0]["detection_type"] == "risk"):
            mask = "[The input was rejected as harmful by granite guardian]."
        return mask
        

    def inference_model( messages, context, stream ):
        query = messages[-1].get("content")
        api_client = get_api_client(context)

        grounding_context = proximity_search(query, api_client)

        grounding = grounding_context
        messages.insert(0, {
            "role": f"system",
            "content": f"""You are a CPL (Credit for Prior Learning) evaluation assistant for Northeastern University's Project Management program.

## YOUR TASK:
The user may ask about anything, mostly regarding a student based on NUID or student's name.
You MUST search the grounded documents below to find information about the student.

## FROM THE GROUNDED DOCUMENTS, EXTRACT:
- student's NUID
- student's name  
- request_type (experience-based OR credit transfer)
- target_course (the NU course they want credit for)
- uploaded documents (resume, transcript, syllabus, etc.)

---

## EVALUATION LOGIC:

### IF request_type == "experience" OR "job" OR "work" OR similar:
Analyze the RESUME document only.

**For RECOMMENDATION request (user asks for "recommendation", "evaluate", "assess", etc.):**
Evaluate based on these criteria:
1. Does the student have relevant DIRECT project management experience?
2. Does the student have MS Project experience?
3. Does the student have at least 3 years of full-time or part-time experience?

Provide:
- AI Recommendation (3-4 sentences)
- Decision: APPROVE or DENY
- Justification based on the 3 criteria above

**For SUMMARY request (user asks for "summary", "overview", "details", etc.):**
Provide a summary (3-4 sentences in your own words) based on the metadata of documents matching the NUID.

---

### IF request_type == "credit transfer" OR "course credit" OR similar:
Analyze the TRANSCRIPT and SYLLABUS documents.

**Transcript Analysis Requirements:**
- Course name must match the student's syllabus subject
- Grade must be B or 3.0/4.0 or equivalent
- Transcript must not be older than 5 academic years

**Syllabus Comparison:**
- Find the Northeastern University syllabus for the target_course
- Compare student's uploaded syllabus with NU's target_course syllabus
- Check for topic alignment, learning outcomes match

**For RECOMMENDATION request:**
- Compare student's syllabus with target_course syllabus
- Analyze transcript grades and dates
- Provide AI Recommendation (3-4 sentences)
- Decision: APPROVE or DENY with justification

**For SUMMARY request:**
Provide a summary (3-4 sentences) based on the metadata of documents matching the NUID.

---

## RESPONSE FORMAT:

Always structure your response clearly:

**Student Information:**
- Name: [extracted from documents]
- NUID: [extracted from documents]
- Request Type: [experience/credit transfer]
- Target Course: [course code]

**Analysis:**
[Your analysis based on the documents]

**Recommendation:** APPROVE / DENY
**Justification:** [3-4 sentences explaining why]

---

## IMPORTANT RULES:
1. ONLY use information from the grounded documents below
2. If you cannot find the student's documents, say "I could not find documents for this student"
3. If request_type is unclear, ask for clarification
4. Be specific about which documents you analyzed

---

## GROUNDED DOCUMENTS FROM KNOWLEDGE BASE:
{grounding_context}

[[[don't mention about the grounded documents in the prompt response. Act like this is a production level chatbot responding!]]]
[[[You can answer any other question based on the documents like about any syllabus document that is not related to a student i.e., northeastern university's syllabus document, and it DOES NOT NEED ANY META DATA]]]
---


### Context:
{grounding}

"""
        })

        # moderate input
        system_prompt_content = "".join(map(lambda message: message.get("content"), list(filter(lambda message: message.get("role") == "system", messages))))
        
        detectors = {
                "hap": {
                    "enabled": True,
                    "threshold": 0.5
                }
            }
        detections = text_detection(context, f'{system_prompt_content}{query}', detectors)
        if (len(detections)):
            mask = get_moderation_input_mask(detections)
            if (stream):
                return moderation_input_stream(mask)
            else:
                return moderation_input(mask)
            

        model_id = "meta-llama/llama-3-3-70b-instruct"
        parameters =  {
            "frequency_penalty": 0,
            "max_tokens": 2000,
            "presence_penalty": 0,
            "temperature": 0,
            "top_p": 1
        }
        model = ModelInference(
            model_id = model_id,
            api_client = api_client,
            params = parameters
        )
        # Generate grounded response
        if (stream == True):
            generated_response = model.chat_stream(messages=messages)
        else:
            generated_response = model.chat(messages=messages)

        return generated_response


    def generate(context):
        payload = context.get_json()
        messages = payload.get("messages")
        
        # Grounded inferencing
        generated_response = inference_model(messages, context, False)

        execute_response = {
            "headers": {
                "Content-Type": "application/json"
            },
            "body": generated_response
        }

        return execute_response

    def generate_stream(context):
        payload = context.get_json()
        messages = payload.get("messages")

        # Grounded inferencing
        response_stream = inference_model(messages, context, True)

        moderated_stream = moderate_stream(response_stream)

        for chunk in moderated_stream:
            yield chunk

    return generate, generate_stream


### 2.2 Test locally

In [24]:
# Initialize AI Service function locally
from ibm_watsonx_ai.deployments import RuntimeContext

context = RuntimeContext(api_client=client)

streaming = False
findex = 1 if streaming else 0
local_function = gen_ai_service(context, vector_index_id=vector_index_id, space_id=space_id)[findex]
messages = []

In [25]:
local_question = "Change this question to test your function"

messages.append({ "role" : "user", "content": local_question })

context = RuntimeContext(api_client=client, request_payload_json={"messages": messages})

response = local_function(context)

result = ''

if (streaming):
    for chunk in response:
        if (len(chunk["choices"])):
            print(chunk["choices"][0]["delta"]["content"], end="", flush=True)
else:
    print(response)




## 3. Store and deploy the AI Service
Before you can deploy the AI Service, you must store the AI service in your watsonx.ai repository.

In [26]:
# Look up software specification for the AI service
software_spec_id_in_project = "45f12dfe-aa78-5b8d-9f38-0ee223c47309"
software_spec_id = ""

try:
    software_spec_id = client.software_specifications.get_id_by_name("runtime-24.1-py3.11")
except:
    software_spec_id = client.spaces.promote(software_spec_id_in_project, source_project_id, space_id)

In [27]:
# Define the request and response schemas for the AI service
request_schema = {
    "application/json": {
        "$schema": "http://json-schema.org/draft-07/schema#",
        "type": "object",
        "properties": {
            "messages": {
                "title": "The messages for this chat session.",
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "role": {
                            "title": "The role of the message author.",
                            "type": "string",
                            "enum": ["user","assistant"]
                        },
                        "content": {
                            "title": "The contents of the message.",
                            "type": "string"
                        }
                    },
                    "required": ["role","content"]
                }
            }
        },
        "required": ["messages"]
    }
}

response_schema = {
    "application/json": {
        "oneOf": [{"$schema":"http://json-schema.org/draft-07/schema#","type":"object","description":"AI Service response for /ai_service_stream","properties":{"choices":{"description":"A list of chat completion choices.","type":"array","items":{"type":"object","properties":{"index":{"type":"integer","title":"The index of this result."},"delta":{"description":"A message result.","type":"object","properties":{"content":{"description":"The contents of the message.","type":"string"},"role":{"description":"The role of the author of this message.","type":"string"}},"required":["role"]}}}}},"required":["choices"]},{"$schema":"http://json-schema.org/draft-07/schema#","type":"object","description":"AI Service response for /ai_service","properties":{"choices":{"description":"A list of chat completion choices","type":"array","items":{"type":"object","properties":{"index":{"type":"integer","description":"The index of this result."},"message":{"description":"A message result.","type":"object","properties":{"role":{"description":"The role of the author of this message.","type":"string"},"content":{"title":"Message content.","type":"string"}},"required":["role"]}}}}},"required":["choices"]}]
    }
}

In [28]:
# Store the AI service in the repository
ai_service_metadata = {
    client.repository.AIServiceMetaNames.NAME: "CPL Rag Notebook",
    client.repository.AIServiceMetaNames.DESCRIPTION: "",
    client.repository.AIServiceMetaNames.SOFTWARE_SPEC_ID: software_spec_id,
    client.repository.AIServiceMetaNames.CUSTOM: {},
    client.repository.AIServiceMetaNames.REQUEST_DOCUMENTATION: request_schema,
    client.repository.AIServiceMetaNames.RESPONSE_DOCUMENTATION: response_schema,
    client.repository.AIServiceMetaNames.TAGS: ["wx-vector-index"]
}

ai_service_details = client.repository.store_ai_service(meta_props=ai_service_metadata, ai_service=gen_ai_service)

In [29]:
# Get the AI Service ID

ai_service_id = client.repository.get_ai_service_id(ai_service_details)

In [30]:
# Deploy the stored AI Service
deployment_custom = {}
deployment_metadata = {
    client.deployments.ConfigurationMetaNames.NAME: "CPL Rag Notebook",
    client.deployments.ConfigurationMetaNames.ONLINE: {},
    client.deployments.ConfigurationMetaNames.CUSTOM: deployment_custom,
    client.deployments.ConfigurationMetaNames.DESCRIPTION: "",
    client.repository.AIServiceMetaNames.TAGS: ["wx-vector-index"]
}

function_deployment_details = client.deployments.create(ai_service_id, meta_props=deployment_metadata, space_id=space_id)




######################################################################################

Synchronous deployment creation for id: 'ab4fe104-a8d3-4f8e-9242-0a4ab17186e4' started

######################################################################################


initializing
Note: online_url and serving_urls are deprecated and will be removed in a future release. Use inference instead.
...
ready


-----------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_id='a89ff2b2-73b6-475d-adc9-83fcaa612085'
-----------------------------------------------------------------------------------------------




## 4. Test AI Service

In [31]:
# Get the ID of the AI Service deployment just created

deployment_id = client.deployments.get_id(function_deployment_details)
print(deployment_id)

a89ff2b2-73b6-475d-adc9-83fcaa612085


In [36]:
messages = []
remote_question = "Tell me about John Smith's experience"
messages.append({ "role" : "user", "content": remote_question })
payload = { "messages": messages }

In [37]:
result = client.deployments.run_ai_service(deployment_id, payload)
if "error" in result:
    print(result["error"])
else:
    print(result)



# Next steps
You successfully deployed and tested the AI Service! You can now view
your deployment and test it as a REST API endpoint.

<a id="copyrights"></a>
### Copyrights

Licensed Materials - Copyright © 2024 IBM. This notebook and its source code are released under the terms of the ILAN License.
Use, duplication disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

**Note:** The auto-generated notebooks are subject to the International License Agreement for Non-Warranted Programs (or equivalent) and License Information document for watsonx.ai Auto-generated Notebook (License Terms), such agreements located in the link below. Specifically, the Source Components and Sample Materials clause included in the License Information document for watsonx.ai Studio Auto-generated Notebook applies to the auto-generated notebooks.  

By downloading, copying, accessing, or otherwise using the materials, you agree to the <a href="https://www14.software.ibm.com/cgi-bin/weblap/lap.pl?li_formnum=L-AMCU-BYC7LF" target="_blank">License Terms</a>  