LLMs by default wait until they have their entire response before returning it to the user.  

In this lab we will explore streaming - where the response is returned to the user in chunks as they are generated.  This typically gives a better User Experience.

In [None]:
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.llms.bedrock import Bedrock
import boto3

We will be using Anthropic Claude in this lab.

In [None]:
REGION_NAME="us-east-1" ## change to your region
PROFILE_NAME="lza-comm-gss"  ## change to your desired aws credential profile
## ensure AI21 Jurassic Ultra is enabled in your AWS Account.
named_profile = boto3.session.Session(profile_name=PROFILE_NAME)
bedrock_client = named_profile.client('bedrock-runtime')
print('Initalizing Anthropic Claude v2')
model = Bedrock(
    client=bedrock_client,
    model_id="anthropic.claude-v2",
    endpoint_url="https://bedrock-runtime." + REGION_NAME + ".amazonaws.com",
    model_kwargs={"temperature": 0}
)


Notice we are using a template in this example.  

Templates allow parameterizing input so you can inject context.   This template uses a chat structure where we tell the system to act as a knowledgeable historian.  Then we can pass it a question - where we can inject the user input in the {}.  

Also notice, the chat template requires a dictionary.

Also Notice that we are using the runnable construct - this is LCEL - Langchain Expression Language - which helps simply defining components.  

Also notice the outputp arser at this end.  Output parsers are used to set the type and structure of your output.  In this example the StrOutputParser is redundant since it already returns a stream  - but feel free to play around with different parsers and see your results.

### Further Reading
[Output Parsers ](https://python.langchain.com/docs/modules/model_io/output_parsers/quick_start)

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a very knowledgeable historian who provides accurate and eloquent answers to historical questions.",
        ),
        ("human", "{question}"),
    ]
)
runnable = prompt | model | StrOutputParser()
question={"question":"Who was the first president of the United States?"}

Now we invoke our model.  Notice we loop throught he stream and print each chunk.

In [None]:

for chunk in runnable.stream(question):
    print(chunk, end="", flush=True)