## Example on how to query Caikit-TGIS with Llama2 using Langchain and a custom Prompt

### Set the Inference server url (replace with your own address) and the model-id

In [1]:
inference_server_url = "your_server_address:port"
model_id = "Llama-2-7b-chat-hf"

inference_server_url = "caikit-tgis-example-isvc-predictor.kserve-demo.svc.cluster.local:80"
model_id = 'flan-t5-small-caikit'

# If your endpoint is using a self-signed certificate, export the certificate chain as a .pem file and provide its path
# Example: certificate_chain = "certificate.pem"
# Adjust llm instantiation to use this parameter or not
certificate_chain_file = "certificate.pem"

In [2]:
# Optional, requirements if they are not already present
# !pip -q install grpcio grpcio-reflection

## Preparation

In [3]:
# Imports
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
import caikit_tgis_langchain
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from langchain.prompts import PromptTemplate

#### Create the LLM instance

In [4]:
# LLM definition
llm = caikit_tgis_langchain.CaikitLLM(
    inference_server_url=inference_server_url,
    model_id=model_id,
    certificate_chain=certificate_chain_file,
    streaming=True
)

#### Create the Prompt

In [5]:
template="""<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
{history}
Human: {input}
AI:
[/INST]
"""
PROMPT = PromptTemplate(input_variables=["history", "input"], template=template)

#### And finally some memory for the conversation

In [6]:
memory = ConversationBufferMemory()

## First example, fully verbose mode

#### Create the Chain using the different components

In [7]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
llm_kwargs = {"min_new_tokens": 10, "max_new_tokens": 200, "preserve_input_text": False}
conversation = ConversationChain(llm=llm,
                                 llm_kwargs=llm_kwargs,
                                 prompt=PROMPT,
                                 verbose=True,
                                 memory=memory
                                )

#### Let's talk...

In [8]:
first_input = "Can you describe Paris in 100 words?"
conversation.predict(input=first_input, callbacks=[StreamingStdOutCallbackHandler()]);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:

Human: Can you describe Paris in 100 words?
AI:
[/INST]
[0m
Of course, I'd be happy to help! Paris is a beautiful and historic city located in the northern central part of France. It is known for its stunning architecture, art museums, and romantic atmosphere. The Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum are some of the most famous landm

In [9]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Are there any restaurants there?"
conversation.predict(input=second_input, callbacks=[StreamingStdOutCallbackHandler()]);



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3m<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always be as helpful as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

Current conversation:
Human: Can you describe Paris in 100 words?
AI: Of course, I'd be happy to help! Paris is a beautiful and historic city located in the northern central part of France. It is known for its stunning architecture, art museums, and romantic atmosphere. The Eiffel Tower, Notre-Dame Cathedral, and the Louvre Museum are some of the most famous landmarks in Paris.

### Second example, no verbosity

In [10]:
# Verbose mode is intentionally set to True so you can see the prompt and the history from the buffer memory
llm_kwargs = {"min_new_tokens": 10, "max_new_tokens": 200, "preserve_input_text": False}
conversation2 = ConversationChain(llm=llm,
                                  llm_kwargs=llm_kwargs,
                                 prompt=PROMPT,
                                 verbose=False,
                                 memory=memory
                                )

In [11]:
# Let's clear the previous conversation first
memory.clear()

In [12]:
first_input = "Can you describe London in 100 words?"
conversation2.predict(input=first_input, callbacks=[StreamingStdOutCallbackHandler()]);

Of course, I'd be happy to help! London is a vibrant and diverse city located in the United Kingdom. It is known for its rich history, cultural landmarks, and modern infrastructure. From the iconic Tower of London to the bustling streets of Camden, London has something to offer for everyone. Its world-class museums, theaters, and restaurants make it a popular destination for tourists and locals alike.

In [13]:
# This is a follow-up question without context to showcase the conversation memory
second_input = "Are there any restaurants there?"
conversation2.predict(input=second_input, callbacks=[StreamingStdOutCallbackHandler()]);

AI: Yes, there are many restaurants in London that offer a wide range of cuisines and dining experiences. From traditional British pubs to international restaurants, London has something to satisfy every taste and budget. Some popular dining areas include Covent Garden, Soho, and Camden, which are known for their vibrant atmosphere and diverse food options. Additionally, London is home to many world-renowned restaurants, such as The Ritz, The Savoy, and The Dorchester, which offer fine dining experiences.