# Caching
LangChain provides an optional caching layer for chat models. This is useful for two reasons:

It can save you money by reducing the number of API calls you make to the LLM provider, if you're often requesting the same completion multiple times. It can speed up your application by reducing the number of API calls you make to the LLM provider.

In [1]:
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo")

from langchain.globals import set_llm_cache

## In Memory Cache

In [9]:
%%time
from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

# The first time, it is not yet in cache, so it should take longer
llm.predict("Tell me a joke")

CPU times: user 15.2 ms, sys: 2.85 ms, total: 18 ms
Wall time: 956 ms


'Why did the scarecrow win an award? Because he was outstanding in his field!'

In [12]:
%%time
# The second time it is, so it goes faster
llm.predict("Tell me a joke")

CPU times: user 2.25 ms, sys: 430 µs, total: 2.68 ms
Wall time: 2.59 ms


'Why did the scarecrow win an award? Because he was outstanding in his field!'

## SQLite Cache

In [13]:
!rm .langchain.db

rm: .langchain.db: No such file or directory


In [22]:
# We can do the same thing with a SQLite cache
from langchain.cache import SQLiteCache

set_llm_cache(SQLiteCache(database_path=".langchain.db"))

In [23]:
%%time
# The first time, it is not yet in cache, so it should take longer
# llm.predict("Tell me a joke")
llm.invoke("Tell me a joke")

CPU times: user 4.74 ms, sys: 1.76 ms, total: 6.49 ms
Wall time: 39.1 ms


AIMessage(content='Why did the math book look sad?\n\nBecause it had too many problems.', response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 11, 'total_tokens': 26}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-8d174bab-edda-4d74-90d7-6cb51acd0aa3-0', usage_metadata={'input_tokens': 11, 'output_tokens': 15, 'total_tokens': 26})

In [24]:
%%time
# The second time it is, so it goes faster
llm.invoke("Tell me a joke")

CPU times: user 4.24 ms, sys: 1.18 ms, total: 5.41 ms
Wall time: 4.67 ms


AIMessage(content='Why did the math book look sad?\n\nBecause it had too many problems.', response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 11, 'total_tokens': 26}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-8d174bab-edda-4d74-90d7-6cb51acd0aa3-0', usage_metadata={'input_tokens': 11, 'output_tokens': 15, 'total_tokens': 26})