# Application Deployment

Finally, we want to put all pieces together and deploy the LLM-powered chatbot application we have created throughout the lab. 

## Infrastructure as Code: CloudFormation and SAM

Complying with AW and DevOps best practices, we will be be conducting an Infrastructure as Code deployment for the majority of the application stack. Therefor we will be using [AWS Serverless Application Model (SAM)](https://aws.amazon.com/serverless/sam/).

## Deploy stack with SAM

Before we will deploy the AWS SAM stack, we need to adjust the Lambda function's environment variable pointing to the Kendra index. 

**Please overwrite the the placeholder \*\*\*KENDRA_INDEX_ID\*\*\* in the file ```template.yml```(you can search with STRG/CMD+F)with the index id of the Kendra index we created.** 

Further, we need to adjust the Lambda function's environment variable pointing to the LLM we've deployed. 

**Please overwrite the the placeholder \*\*\*SM_ENDPOINT_NAME\*\*\* in the file ```template.yml```(you can search with STRG/CMD+F)with the endpoint name of the model we've deployed.** 


Now we are ready for deployment. Therefor we follow these subsequent steps:

![get-kendra-index](../img/get-kendra-index.gif)


In [None]:
# Building the code artifacts
!sam build

In [None]:
# Deploying the stack
!sam deploy --stack-name rag-stack --resolve-s3 --capabilities CAPABILITY_IAM

Once the deployment is done, we can go ahead to the CloudFormation service and select the "Resources" tab of the Stack "rag-app". Click on the "Physical ID" of the LoadBalancer and copy the DNS name of the page you get forwarded to. You can now reach the web application through a browser by using this as URL.

![get-url](../img/get-url.gif)

# Application testing

Now that we are in the chat, let us check some things we want to ask our chatbot, while keeping in mind the resource constrains that we have in the demo accounts. 

Lets ask about Amazon EC2. What it is, how we can create one and some more information about it. 
Take a look at the below conversation and try to think why the answers are structured as they are.

<p align="center">
  <img src="../img/ChatEC2.png" alt="A chat with the model about EC2">
</p>

First of all, we can see that the LLM has memory about the previous conversation turn, as we reference EC2 implicitly via "Okay. How can I create one?" 

Secondly, we see that the shortcoming of a low number of retrieved characters on the Kendra side. This can be solved by increasing this limit in your own account.  

#### Discussions about the patents that we uploaded 
Patents can be one of the hardest documents to find, read and investigate the claims that are made in them. After all, the claim of the patents describes exactly what has been protected. It would therefore be good way to have an easier interaction with it. 
Lets see how far we can get if we would add a patent database to our system. 
<p align="center">
  <img src="../img/PatentChat.png" alt="A chat with the model about one of the patents we downloaded">
</p>

### Conclusion:
We have two main drivers for the quality of the interaction. 
- The retrieval quality of our retriever. For Kendra, there are plenty of options to optimise the retrieval quality through human feedback, metadata, query optimisation and tuning search relevance to name only a few. However, this is out of scope for this workshop. We would like to point the interested reader to the [docs](https://docs.aws.amazon.com/kendra/latest/dg/tuning.html) as well as the [Kendra workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/df64824d-abbe-4b0d-8b31-8752bceabade/en-US). 
- The LLM that we are using for the chat interaction. Here, especially models with larger context windows can be helpful to get wider context. 

To conclude, RAG can be a very helpful approach to augment your company internal and external search. The retrieval and LLM quality are of high importance to this approach, and the generated load on the systems can be substantial. Especially here, a careful cost consideration between a token based and an infrastructure based pricing model should be done. 
