# Semantic Memory

https://www.youtube.com/watch?v=3Yp-hIEcWXk&t=1s

In [1]:
# %pip install -U langmem langgraph

In [5]:
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore

from dotenv import load_dotenv, find_dotenv
import os 

_ = load_dotenv(find_dotenv())

checkpointer = InMemorySaver()
store = InMemoryStore()

In [12]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="deepseek-chat",
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url=os.getenv("DEEPSEEK_BASE_URL"))

In [13]:
agent = create_react_agent(model, tools=[], store=store, checkpointer=checkpointer)

In [14]:
def chat(agent, txt, thread_id):
    result_state = agent.invoke({"messages": [{"role": "user", "content": txt}]}, config={"configurable": {"thread_id": thread_id}})
    return result_state["messages"][-1].content

In [15]:
thread_1 = "thread-1"
chat(agent, "你能不能告诉我什么是OpenVINO? 它是在什么场景下使用的? 会改变计算图么?", thread_1)

'当然可以！**OpenVINO**（Open Visual Inference and Neural Network Optimization）是英特尔推出的一款开源工具套件，旨在优化和加速深度学习模型的推理（Inference）性能，特别是在英特尔硬件（如CPU、GPU、VPU、FPGA等）上运行。它主要用于将训练好的深度学习模型部署到边缘设备或嵌入式设备上，以实现高效的推理任务。\n\n---\n\n### **OpenVINO 的核心功能**\n1. **模型优化**：\n   - OpenVINO 提供了一个 **Model Optimizer**（模型优化器），可以将训练好的深度学习模型（如 TensorFlow、PyTorch、Caffe、ONNX 等格式）转换为 OpenVINO 的中间表示（Intermediate Representation, IR）。\n   - 优化后的模型会减少计算量、降低内存占用，并针对英特尔硬件进行优化。\n\n2. **推理引擎**：\n   - OpenVINO 提供了一个 **Inference Engine**（推理引擎），用于在英特尔硬件上高效地运行优化后的模型。\n   - 它支持多种硬件加速器（如 CPU、GPU、VPU、FPGA 等），并提供了统一的 API 接口。\n\n3. **硬件加速**：\n   - OpenVINO 针对英特尔硬件进行了深度优化，能够充分利用硬件的计算能力（如 AVX-512 指令集、集成显卡、神经计算棒等）。\n\n4. **跨平台支持**：\n   - OpenVINO 支持多种操作系统（如 Windows、Linux、macOS）和编程语言（如 Python、C++）。\n\n---\n\n### **OpenVINO 的使用场景**\nOpenVINO 主要用于以下场景：\n1. **边缘计算**：\n   - 在边缘设备（如摄像头、无人机、工业设备）上部署深度学习模型，实现实时推理。\n   - 例如：人脸识别、物体检测、语音识别等。\n\n2. **嵌入式设备**：\n   - 在资源受限的设备（如英特尔 Movidius 神经计算棒）上运行深度学习模型。\n\n3. **计算机视觉任务**：\n   - 适用于图像分类、目标检测、语义分割、姿态估计等计算机视觉任

In [16]:
chat(agent, "你能不能推荐更多类似的软件?", thread_1)

'当然可以！除了 **OpenVINO**，还有许多类似的工具和框架用于优化和加速深度学习模型的推理。以下是几款常见的工具，适用于不同的硬件平台和场景：\n\n---\n\n### **1. TensorRT (NVIDIA)**\n- **简介**：TensorRT 是 NVIDIA 推出的高性能深度学习推理库，专为 NVIDIA GPU 优化。\n- **特点**：\n  - 支持模型量化（FP16、INT8）。\n  - 提供动态张量、层融合等优化技术。\n  - 支持 TensorFlow、PyTorch、ONNX 等模型格式。\n- **使用场景**：\n  - 适用于 NVIDIA GPU 上的高性能推理任务。\n  - 常用于自动驾驶、视频分析、推荐系统等。\n- **官网**：[TensorRT](https://developer.nvidia.com/tensorrt)\n\n---\n\n### **2. ONNX Runtime**\n- **简介**：ONNX Runtime 是一个高性能推理引擎，支持 ONNX（Open Neural Network Exchange）格式的模型。\n- **特点**：\n  - 跨平台支持（CPU、GPU、FPGA 等）。\n  - 提供多种优化选项（如算子融合、量化）。\n  - 支持多种硬件后端（如 CUDA、DirectML、TensorRT）。\n- **使用场景**：\n  - 适用于需要跨平台部署的推理任务。\n  - 常用于边缘计算、云推理等。\n- **官网**：[ONNX Runtime](https://onnxruntime.ai/)\n\n---\n\n### **3. TVM (Apache TVM)**\n- **简介**：TVM 是一个开源的深度学习编译器栈，旨在优化和加速深度学习模型的推理。\n- **特点**：\n  - 支持多种硬件后端（CPU、GPU、FPGA、ARM 等）。\n  - 提供自动调优功能（AutoTVM）。\n  - 支持 TensorFlow、PyTorch、MXNet 等模型格式。\n- **使用场景**：\n  - 适用于需要跨硬件平台部署的推理任务。\n  - 常用于边缘设备、嵌入式系统等。\n- **官网**：[Apache TVM](ht

In [17]:
thread_2 = "thread-2"
chat(agent, "你能不能推荐更多类似的软件?", thread_2)

'当然可以！为了给你推荐最合适的软件，我需要了解你正在寻找的软件类型或功能。以下是一些常见的软件类别，你可以告诉我你对哪一类感兴趣：\n\n1. **生产力工具**（如笔记软件、任务管理、日历等）\n2. **设计工具**（如平面设计、UI/UX设计、3D建模等）\n3. **开发工具**（如代码编辑器、版本控制、数据库管理等）\n4. **多媒体工具**（如视频编辑、音频处理、图像编辑等）\n5. **社交与沟通工具**（如即时通讯、视频会议、团队协作等）\n6. **学习与教育工具**（如在线课程、语言学习、知识管理等）\n7. **娱乐与游戏**（如游戏平台、流媒体、音乐播放器等）\n\n如果你能提供更多具体信息，我会为你推荐更精准的软件！'

## Adding Memory

We'll use regular tools.

In [20]:
from langmem import create_manage_memory_tool, create_search_memory_tool

store = InMemoryStore(
    index={
        "dims": 1536,
        "embed": "openai:text-embedding-3-small"
    }
)

namespace = ("agent_memories",)
memory_tools = [
    create_manage_memory_tool(namespace),
    create_search_memory_tool(namespace)
]
checkpointer = InMemorySaver()
agent = create_react_agent(model, tools=memory_tools, store=store, checkpointer=checkpointer)

In [21]:
thread_1 = "thread-1"
chat(agent, "告诉我FlashAttention的原理?具体它是如何工作的?", thread_1)

UnprocessableEntityError: Failed to deserialize the JSON body into the target type: messages[2]: invalid type: sequence, expected a string at line 1 column 411

In [None]:
print(chat(agent, "Nice! Wish me luck! Please note down the detailed memories for me :)", thread_1))

In [None]:
thread_2 = "thread-3"
chat(agent, "Remember what I'm supposed to do for my training this week? It's week 3...", thread_2)

In [None]:
chat(agent, "That may be tricky. I just sprained my ankle. Could you update my plan to include more cross training? Be sure to update the existing key of our plan", thread_2)

## Different User Memories

In [13]:
from langmem import create_manage_memory_tool, create_search_memory_tool

store = InMemoryStore(
    index={
        "dims": 1536,
        "embed": "openai:text-embedding-3-small"
    }
)

namespace = ("agent_memories", "{user_id}")
memory_tools = [
    create_manage_memory_tool(namespace),
    create_search_memory_tool(namespace)
]
checkpointer = InMemorySaver()

agent = create_react_agent("anthropic:claude-3-5-sonnet-latest", tools=memory_tools, store=store, checkpointer=checkpointer)

In [14]:
def chat(agent, txt, thread_id, user_id):
    result_state = agent.invoke({"messages": [{"role": "user", "content": txt}]}, 
                                config={"configurable": {"thread_id": thread_id, "user_id": user_id}})
    return result_state["messages"][-1].content

In [None]:
thread_1 = "thread-1"
user_id = "User-A"
chat(agent, 
     "Hi I'm Will, I'm training for a half marathon in 2 months - could you propose a daily training plan to prepare and help me stay honest??",
     thread_1,
     user_id)

In [None]:
thread_1 = "thread-2"
user_id2 = "User-B"
chat(agent, 
     "Hi I'm John, I'm learning chess - could you help me become great??",
     thread_1,
     user_id2)

In [None]:
chat(agent, 
     "Do you remember me liking any sports?",
     thread_1,
     user_id2)

In [None]:
items = store.search(("agent_memories",))
for item in items:
    print(item.namespace, item.value)

## "Eager" memory retrieval

We can fetch memories before the first LLM call to simplify its response. Otherwise, it has known and unknown unknowns so will almost always try to search for some subclass of questions.

In [19]:
from langmem import create_manage_memory_tool, create_search_memory_tool
from langgraph.config import get_store

store = InMemoryStore(
    index={
        "dims": 1536,
        "embed": "openai:text-embedding-3-small"
    }
)

namespace = ("agent_memories",)
memory_tools = [
    create_manage_memory_tool(namespace),
    create_search_memory_tool(namespace)
]
checkpointer = InMemorySaver()

def prompt(state):
    # Search over memories based on the messages
    store = get_store()
    items = store.search(namespace, query=state["messages"][-1].content)
    memories = "\n\n".join(str(item) for item in items)
    system_msg = {"role": "system", "content": f"## Memories:\n\n{memories}"}
    return [system_msg] + state["messages"]
    
agent = create_react_agent("anthropic:claude-3-5-sonnet-latest", prompt=prompt, tools=memory_tools, store=store, checkpointer=checkpointer)

In [None]:
thread_1 = "thread-1"
chat(agent, "Hi there, I'm training for a half marathon in 2 months - could you propose a daily training plan to prepare?", thread_1, None)

In [None]:
print(chat(agent, "Nice! Wish me luck! Please note down the detailed memories for me :)", thread_1, None))

In [None]:
thread_2 = "thread-2"
chat(agent, "What I'm supposed to do for my training this week? It's week 3...", thread_2, None)