## Usage

Lagent 受到 PyTorch 设计理念的启发。我们期望，与神经网络层的类比能让工作流程更清晰、更直观，这样用户只需专注于以 Python 风格创建层并定义它们之间的消息传递。这是一个简单的教程，能让你快速上手构建多智能体应用程序。 

### Model as Agents 

Agents use AgentMessage for communication.

In [1]:
from typing import Dict, List
from lagent.agents import Agent
from lagent.schema import AgentMessage
from lagent.llms import VllmModel, INTERNLM2_META

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "6"

llm = VllmModel(
    path="../models/Qwen2.5-1.5B-Instruct",
    meta_template=INTERNLM2_META,
    tp=1,
    top_k=1,
    temperature=1.0,
    stop_words=['<|im_end|>'],
    max_new_tokens=1024,
)

system_prompt = '你的回答只能从“典”、“孝”、“急”三个字中选一个。'
agent = Agent(llm, system_prompt)

user_msg = AgentMessage(sender='user', content='今天天气情况')
bot_msg = agent(user_msg)
print(bot_msg)

  from pkg_resources import resource_string


INFO 06-23 15:22:22 llm_engine.py:174] Initializing an LLM engine (v0.5.4) with config: model='../models/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='../models/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None), seed=0, served_model_name=../models/Qwen2.5-1.5B-Instruct, use_v2_block_manager=False, enable_prefix_caching=False)


  get_ip(), get_open_port())


INFO 06-23 15:22:23 model_runner.py:720] Starting to load model ../models/Qwen2.5-1.5B-Instruct...


Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]


INFO 06-23 15:22:24 model_runner.py:732] Loading model weights took 2.8875 GB
INFO 06-23 15:22:25 gpu_executor.py:102] # GPU blocks: 37295, # CPU blocks: 9362
INFO 06-23 15:22:27 model_runner.py:1024] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 06-23 15:22:27 model_runner.py:1028] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.
INFO 06-23 15:22:36 model_runner.py:1225] Graph capturing finished in 9 secs.


Processed prompts: 100%|██████████| 1/1 [00:00<00:00, 29.44it/s, est. speed input: 1034.21 toks/s, output: 59.07 toks/s]


content='典' sender='Agent' formatted=None extra_info=None type=None receiver=None stream_state=<AgentStatusCode.END: 0>


### Memory as State 

In [2]:
memory: List[AgentMessage] = agent.memory.get_memory()
print(memory)
print('-' * 120)
dumped_memory: Dict[str, List[dict]] = agent.state_dict()
print(dumped_memory['memory'])

[AgentMessage(content='今天天气情况', sender='user', formatted=None, extra_info=None, type=None, receiver=None, stream_state=<AgentStatusCode.END: 0>), AgentMessage(content='典', sender='Agent', formatted=None, extra_info=None, type=None, receiver=None, stream_state=<AgentStatusCode.END: 0>)]
------------------------------------------------------------------------------------------------------------------------
[{'content': '今天天气情况', 'sender': 'user', 'formatted': None, 'extra_info': None, 'type': None, 'receiver': None, 'stream_state': <AgentStatusCode.END: 0>}, {'content': '典', 'sender': 'Agent', 'formatted': None, 'extra_info': None, 'type': None, 'receiver': None, 'stream_state': <AgentStatusCode.END: 0>}]


### 自定义消息聚合

DefaultAggregator 在幕后被调用，用于将 AgentMessage 组装并转换为 OpenAI 消息格式。下面我们了实现一个简单的聚合器，它能够接收少样本。

In [None]:
from typing import List, Union 
from lagent.memory import Memory 
from lagent.prompts import StrParser
from lagent.agents.aggregator import DefaultAggregator

class FewshotAggregator(DefaultAggregator):
    def __init__(self, few_shot: List[dict] = None):
        self.few_shot = few_shot or []

    def aggregate(
        self, 
        messages: Memory, 
        name: str, 
        parser: StrParser,
        system_instruction: Union[str, dict, List[dict]] = None         
    ) -> List[dict]:
        