# Langchain Async

One of the biggest pain-points developers discuss when trying to build useful LLM applications is latency; these applications often make multiple calls to LLM APIs, each one taking a few seconds. It can be quite a frustrating user experience to stare at a loading spinner for more than a couple seconds. Streaming helps reduce this perceived latency by returning the output of the LLM token by token, instead of all at once.

This notebook demonstrates how to monitor a LangChain streaming app with TruLens.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/quickstart/langchain_async.ipynb)

### Import from LangChain and TruLens

In [None]:
# ! pip install trulens_eval==0.18.1 langchain>=0.0.342

In [None]:
import asyncio

from langchain import LLMChain
from langchain.prompts import PromptTemplate
from langchain.callbacks import AsyncIteratorCallbackHandler
from langchain.chains import LLMChain
from langchain.chat_models.openai import ChatOpenAI
from langchain.llms.openai import OpenAI
from langchain.memory import ConversationSummaryBufferMemory

from trulens_eval import Feedback
from trulens_eval import feedback
from trulens_eval import Tru
import trulens_eval.utils.python  # makes sure asyncio gets instrumented

## Setup
### Add API keys
For this example you will need Huggingface and OpenAI keys

In [None]:
import os
os.environ["HUGGINGFACE_API_KEY"] = "hf_..."
os.environ["OPENAI_API_KEY"] = "sk-..."

### Create Async Application

In [None]:
# Set up an async callback.
callback = AsyncIteratorCallbackHandler()

chatllm = ChatOpenAI(
    temperature=0.0,
    streaming=True # important
)
llm = OpenAI(
    temperature=0.0,
)

memory = ConversationSummaryBufferMemory(
    memory_key="chat_history",
    input_key="human_input",
    llm=llm,
    max_token_limit=50
)

# Setup a simple question/answer chain with streaming ChatOpenAI.
prompt = PromptTemplate(
    input_variables=["human_input", "chat_history"],
    template='''
    You are having a conversation with a person. Make small talk.
    {chat_history}
        Human: {human_input}
        AI:'''
)

chain = LLMChain(llm=chatllm, prompt=prompt, memory=memory)

### Set up a language match feedback function.

In [None]:
tru = Tru()
hugs = feedback.Huggingface()
f_lang_match = Feedback(hugs.language_match).on_input_output()

### Set up evaluation and tracking with TruLens

In [None]:
# Example of how to also get filled-in prompt templates in timeline:
from trulens_eval.instruments import instrument
instrument.method(PromptTemplate, "format")

tc = tru.Chain(
    chain,
    feedbacks=[f_lang_match],
    app_id="chat_with_memory"
)

In [None]:
tc.print_instrumented()

### Start the TruLens dashboard

In [None]:
tru.run_dashboard()

### Use the application

In [None]:
message = "Hi. How are you?"

with tc as recording:
    task = asyncio.create_task(
        chain.acall(
            inputs=dict(human_input=message, chat_history=[]),
            callbacks=[callback]
        )
    )

# Note, you either need to process all of the callback iterations or await task
# for record to be available.

async for token in callback.aiter():
    print(token, end="")

# Make sure task was completed:
await task
record = recording.get()