# Langchain Async

One of the biggest pain-points developers discuss when trying to build useful LLM applications is latency; these applications often make multiple calls to LLM APIs, each one taking a few seconds. It can be quite a frustrating user experience to stare at a loading spinner for more than a couple seconds. Streaming helps reduce this perceived latency by returning the output of the LLM token by token, instead of all at once.

This notebook demonstrates how to monitor a LangChain streaming app with TruLens.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/trulens_eval/examples/frameworks/langchain/langchain_async.ipynb)

### Import from LangChain and TruLens

In [None]:
# ! pip install trulens_eval==0.9.0 langchain==0.0.263

In [None]:
import asyncio

from IPython.display import display
from ipywidgets import interact
from ipywidgets import widgets
from langchain import LLMChain
from langchain import PromptTemplate
from langchain.callbacks import AsyncIteratorCallbackHandler
from langchain.chains import LLMChain
from langchain.chat_models.openai import ChatOpenAI
from langchain.llms.openai import OpenAI
from langchain.memory import ConversationSummaryBufferMemory

from trulens_eval import Feedback
from trulens_eval import feedback
from trulens_eval import Tru
from trulens_eval.keys import check_keys
import trulens_eval.utils.python  # makes sure asyncio gets instrumented

import openai

## Setup
### Add API keys
For this example you will need Huggingface and OpenAI keys

In [None]:
import os
os.environ["HUGGINGFACE_API_KEY"] = "..."
os.environ["OPENAI_API_KEY"] = "..."

### Create Async Application

In [None]:
# Set up an async callback.
callback = AsyncIteratorCallbackHandler()

chatllm = ChatOpenAI(
    temperature=0.0,
    streaming=True# important
    # callbacks=[callback]
)
llm = OpenAI(
    temperature=0.0,
)

memory = ConversationSummaryBufferMemory(
    memory_key="chat_history",
    input_key="human_input",
    llm=llm,
    max_token_limit=50
)

# Setup a simple question/answer chain with streaming ChatOpenAI.
prompt = PromptTemplate(
    input_variables=["human_input", "chat_history"],
    template='''
    You are having a conversation with a person. Make small talk.
    {chat_history}
        Human: {human_input}
        AI:'''
)

chain = LLMChain(llm=chatllm, prompt=prompt, memory=memory)

### Set up a language match feedback function.

In [None]:
tru = Tru()
hugs = feedback.Huggingface()
f_lang_match = Feedback(hugs.language_match).on_input_output()

### Set up evaluation and tracking with TruLens

In [None]:
tc = tru.Chain(chain, feedbacks=[f_lang_match], app_id="chat_with_memory")

### Start the TruLens dashboard

In [None]:
tru.run_dashboard()

### Use the application

In [None]:
message = "Hi. How are you?"

# Create a task with the call to the chain, but don't wait for it yet.
f_res_record = asyncio.create_task(
    tc.acall_with_record(
        inputs=dict(human_input=message),
        callbacks=[callback]
    )
)

# Instead wait for the callback's async generator, getting us each token as it comes in.
async for token in callback.aiter():
    print(token, end = '')

# By now the acall_with_record results should be ready.
res, record = await f_res_record