feat: 😽增加重试机制, 解决模型不稳定的报错 #1344

anka-afk · 2025-04-20T04:57:27Z

修复了 #1300

Motivation

解决由于某些供应商/模型本身不太稳定导致频繁报错
解决网络等其他因素不稳定而导致的报错
在发生错误时不应当直接回复错误信息, 而应该将错误信息保留在日志, 以此防止错误刷屏

Modifications

增加两项供应商配置: 最大重试次数和重试间隔时间
为LLM请求部分增加遇到错误的重试机制

Check

我的 Commit Message 符合良好的规范
我新增/修复/优化的功能经过良好的测试

好的，这是翻译成中文的 pull request 总结：

Sourcery 总结

为 LLM 请求添加重试机制，以提高稳定性和处理瞬时错误

新特性：

为 LLM 提供程序请求实现可配置的重试机制
为 API 调用添加可配置的最大重试次数和重试延迟

增强功能：

改进对不稳定模型提供程序的错误处理
为特定类型的网络和服务器错误添加选择性重试

杂项：

更新默认配置以包含重试设置

Original summary in English

Summary by Sourcery

Add retry mechanism for LLM requests to improve stability and handle transient errors

New Features:

Implement configurable retry mechanism for LLM provider requests
Add configurable maximum retry attempts and retry delay for API calls

Enhancements:

Improve error handling for unstable model providers
Add selective retry for specific types of network and server errors

Chores:

Update default configuration to include retry settings

sourcery-ai · 2025-04-20T04:57:36Z

Sourcery 评审者指南

此 Pull Request 引入了 LLM 请求的重试机制，以解决某些提供商/模型和网络相关问题导致的不稳定性。它增加了最大重试次数和重试延迟的配置选项，并确保错误消息被记录而不是直接返回给用户。

更新后的提供商设置类图

classDiagram
  class ProviderSettings {
    streaming_response: bool
    max_retries: int
    retry_delay: float
  }
  note for ProviderSettings "Added max_retries and retry_delay attributes"

文件级别变更

变更	详情	文件
实现了 LLM 请求的重试机制，以处理由于提供商不稳定或网络问题导致的潜在故障。	在 LLM 请求逻辑周围添加了一个重试循环。实现了一个重试计数器，并在达到最大重试次数时退出循环。在重试尝试之间添加了延迟。添加了对特定错误类型（超时、连接、速率限制、服务器错误、500、503）的检查，以确定是否应尝试重试。记录了包含 traceback 信息的错误消息，用于调试目的。成功请求后上传指标。成功请求后将请求保存到历史记录。	`astrbot/core/pipeline/process_stage/method/llm_request.py`
为 LLM 请求添加了最大重试次数和重试延迟的配置选项。	在配置中的提供商设置中添加了 `max_retries` 和 `retry_delay`。为 `max_retries` (3) 和 `retry_delay` (1.0 秒) 设置了默认值。在 `LLMRequest` 类的初始化期间检索配置值。	`astrbot/core/pipeline/process_stage/method/llm_request.py` `astrbot/core/config/default.py`

可能相关的 issue

[Feature]添加对重试系统 #1300: 此 PR 通过为 LLM 请求添加重试机制来修复此问题。
[Feature]添加对重试系统 #1300: 此 PR 实现了 issue 中请求的重试系统。

提示和命令

与 Sourcery 互动

触发新的审查： 在 Pull Request 上评论 @sourcery-ai review。
继续讨论： 直接回复 Sourcery 的审查评论。
从审查评论生成 GitHub issue： 要求 Sourcery 从审查评论创建一个 issue，方法是回复该评论。您也可以回复审查评论并使用 @sourcery-ai issue 从该评论创建一个 issue。
生成 Pull Request 标题： 在 Pull Request 标题中的任何位置写入 @sourcery-ai 以随时生成标题。您也可以在 Pull Request 上评论 @sourcery-ai title 以随时（重新）生成标题。
生成 Pull Request 摘要： 在 Pull Request 正文中的任何位置写入 @sourcery-ai summary 以随时在您想要的位置生成 PR 摘要。您也可以在 Pull Request 上评论 @sourcery-ai summary 以随时（重新）生成摘要。
生成评审者指南： 在 Pull Request 上评论 @sourcery-ai guide 以随时（重新）生成评审者指南。
解决所有 Sourcery 评论： 在 Pull Request 上评论 @sourcery-ai resolve 以解决所有 Sourcery 评论。如果您已经解决了所有评论并且不想再看到它们，这将非常有用。
驳回所有 Sourcery 审查： 在 Pull Request 上评论 @sourcery-ai dismiss 以驳回所有现有的 Sourcery 审查。如果您想从新的审查开始，这将特别有用 - 不要忘记评论 @sourcery-ai review 以触发新的审查！
为 issue 生成行动计划： 在 issue 上评论 @sourcery-ai plan 以为其生成行动计划。

自定义您的体验

访问您的仪表板以：

启用或禁用审查功能，例如 Sourcery 生成的 Pull Request 摘要、评审者指南等。
更改审查语言。
添加、删除或编辑自定义审查说明。
调整其他审查设置。

获取帮助

联系我们的支持团队提出问题或反馈。
访问我们的文档获取详细指南和信息。
通过在 X/Twitter, LinkedIn 或 GitHub 上关注我们，与 Sourcery 团队保持联系。

Original review guide in English

Reviewer's Guide by Sourcery

This pull request introduces a retry mechanism for LLM requests to address instability issues with certain providers/models and network-related errors. It adds configuration options for maximum retry attempts and retry delay, and it ensures that error messages are logged instead of being directly returned to the user.

Updated class diagram for provider settings

classDiagram
  class ProviderSettings {
    streaming_response: bool
    max_retries: int
    retry_delay: float
  }
  note for ProviderSettings "Added max_retries and retry_delay attributes"

File-Level Changes

Change	Details	Files
Implemented a retry mechanism for LLM requests to handle potential failures due to provider instability or network issues.	Added a retry loop around the LLM request logic. Implemented a retry counter and exit the loop when the maximum number of retries is reached. Added a delay between retry attempts. Added a check for specific error types (timeout, connection, rate limit, server error, 500, 503) to determine if a retry should be attempted. Logged error messages with traceback information for debugging purposes. Uploads metrics after a successful request. Saves the request to history after a successful request.	`astrbot/core/pipeline/process_stage/method/llm_request.py`
Added configuration options for maximum retry attempts and retry delay for LLM requests.	Added `max_retries` and `retry_delay` to the provider settings in the configuration. Default values are set for `max_retries` (3) and `retry_delay` (1.0 second). The configuration values are retrieved during the initialization of the `LLMRequest` class.	`astrbot/core/pipeline/process_stage/method/llm_request.py` `astrbot/core/config/default.py`

Possibly linked issues

[Feature]添加对重试系统 #1300: The PR fixes the issue by adding retry mechanisms for LLM requests.
[Feature]添加对重试系统 #1300: The PR implements the retry system requested in the issue.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!
Generate a plan of action for an issue: Comment @sourcery-ai plan on
an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

嘿 @anka-afk - 我已经审查了你的更改 - 这里有一些反馈：

总体评论：

考虑使用一种补偿策略，该策略会随着每次重试而增加延迟。
定义一个用于可重试错误的自定义异常可能很有用，以避免依赖字符串匹配。

以下是我在审查期间查看的内容

🟢 一般问题：一切看起来都不错
🟢 安全性：一切看起来都不错
🟢 测试：一切看起来都不错
🟡 复杂性：发现 1 个问题
🟢 文档：一切看起来都不错

Sourcery 对开源是免费的 - 如果你喜欢我们的评论，请考虑分享它们 ✨

_{帮助我更有用！请点击每个评论上的 👍 或 👎，我将使用反馈来改进你的评论。}

Original comment in English

Hey @anka-afk - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider using a backoff strategy that increases the delay with each retry.
It might be helpful to define a custom exception for retryable errors to avoid relying on string matching.

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-04-20T04:58:13Z

astrbot/core/pipeline/process_stage/method/llm_request.py

@@ -158,94 +169,132 @@ async def process(
            req.session_id = event.unified_msg_origin

        async def requesting(req: ProviderRequest):


问题 (复杂度): 考虑重构 requesting 函数，通过将请求执行和重试逻辑提取到单独的辅助函数中，以提高可读性并降低嵌套复杂度，而无需更改功能。

考虑将重试逻辑和内部处理提取到单独的辅助函数中。这将扁平化嵌套循环和 try/except 块，以提高可读性，而无需更改行为。例如，将执行单个请求的逻辑移动到其自己的函数中，然后用重试循环包装它：

async def _execute_request(self, req: ProviderRequest, event, provider) -> Optional[LLMResponse]: logger.debug(f"提供商请求 Payload: {req}") final_llm_response = None if self.streaming_response: stream = provider.text_chat_stream(**req.__dict__) async for llm_response in stream: if llm_response.is_chunk: if llm_response.result_chain: yield llm_response.result_chain # MessageChain else: yield MessageChain().message(llm_response.completion_text) else: final_llm_response = llm_response else: final_llm_response = await provider.text_chat(**req.__dict__) if not final_llm_response: raise Exception("LLM response is None.") # Execute post-response event hooks await self._handle_event_hooks(event, final_llm_response) # Handle functions/streaming responses if self.streaming_response: async for result in self._handle_llm_stream_response(event, req, final_llm_response): yield result else: async for result in self._handle_llm_response(event, req, final_llm_response): yield result

然后用重试逻辑包装此执行：

async def requesting(self, req: ProviderRequest, event, provider): retry_count = 0 while True: try: async for result in self._execute_request(req, event, provider): if isinstance(result, ProviderRequest): req = result # new LLM request break # re-enter execution with modified req else: yield result else: # Only break out if no inner loop reset happened. break retry_count = 0 # Reset retry_count if a new req was processed successfully. except Exception as e: retry_count += 1 logger.error(f"LLM请求失败 (尝试 {retry_count}/{self.max_retries}): {type(e).__name__} : {str(e)}") logger.error(traceback.format_exc()) if retry_count < self.max_retries and any(err in str(e).lower() for err in ["timeout", "connection", "rate limit", "server error", "500", "503"]): logger.info(f"将在 {self.retry_delay} 秒后重试 LLM 请求＞﹏＜") await asyncio.sleep(self.retry_delay) else: logger.error(f"LLM 请求失败, 重试次数({retry_count - 1})用尽: {type(e).__name__} : {str(e)}") break

最后，更新您的调用站点以使用此重构的 requesting 函数，保持所有功能完整，同时减少嵌套。

Original comment in English

issue (complexity): Consider refactoring the requesting function by extracting the request execution and retry logic into separate helper functions to improve readability and reduce nesting complexity without altering the functionality.

Consider extracting the retry logic and inner handling into separate helper functions. This would flatten the nested loops and try/except blocks to improve readability without changing behavior. For example, move the logic that executes a single request into its own function and then wrap that with the retry loop:

async def _execute_request(self, req: ProviderRequest, event, provider) -> Optional[LLMResponse]: logger.debug(f"提供商请求 Payload: {req}") final_llm_response = None if self.streaming_response: stream = provider.text_chat_stream(**req.__dict__) async for llm_response in stream: if llm_response.is_chunk: if llm_response.result_chain: yield llm_response.result_chain # MessageChain else: yield MessageChain().message(llm_response.completion_text) else: final_llm_response = llm_response else: final_llm_response = await provider.text_chat(**req.__dict__) if not final_llm_response: raise Exception("LLM response is None.") # Execute post-response event hooks await self._handle_event_hooks(event, final_llm_response) # Handle functions/streaming responses if self.streaming_response: async for result in self._handle_llm_stream_response(event, req, final_llm_response): yield result else: async for result in self._handle_llm_response(event, req, final_llm_response): yield result

Then wrap this execution with retry logic:

async def requesting(self, req: ProviderRequest, event, provider): retry_count = 0 while True: try: async for result in self._execute_request(req, event, provider): if isinstance(result, ProviderRequest): req = result # new LLM request break # re-enter execution with modified req else: yield result else: # Only break out if no inner loop reset happened. break retry_count = 0 # Reset retry_count if a new req was processed successfully. except Exception as e: retry_count += 1 logger.error(f"LLM请求失败 (尝试 {retry_count}/{self.max_retries}): {type(e).__name__} : {str(e)}") logger.error(traceback.format_exc()) if retry_count < self.max_retries and any(err in str(e).lower() for err in ["timeout", "connection", "rate limit", "server error", "500", "503"]): logger.info(f"将在 {self.retry_delay} 秒后重试 LLM 请求＞﹏＜") await asyncio.sleep(self.retry_delay) else: logger.error(f"LLM 请求失败, 重试次数({retry_count - 1})用尽: {type(e).__name__} : {str(e)}") break

Finally, update your call sites to use this refactored requesting function, keeping all functionality intact while reducing nesting.

Copilot

Pull Request Overview

This PR introduces a configurable retry mechanism for LLM requests to improve stability and prevent excessive user-facing error messages in case of transient failures. The changes include the addition of new configuration options, implementation of retry logic in the LLM request flow, and updates to the default configuration settings.

Added "max_retries" and "retry_delay" options in provider settings.
Implemented a retry loop for LLM requests with error logging and conditional delays.
Updated the default configuration file to include new retry parameters.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
astrbot/core/pipeline/process_stage/method/llm_request.py	Implemented retry mechanism for LLM requests with detailed logging
astrbot/core/config/default.py	Added default values for "max_retries" and "retry_delay"

Comments suppressed due to low confidence (2)

astrbot/core/pipeline/process_stage/method/llm_request.py:295

The log message subtracts 1 from retry_count, which may confuse readers about the actual number of attempts. Consider logging the actual retry count to improve clarity.

logger.error(f"LLM 请求失败, 重试次数({retry_count - 1})用尽: {type(e).__name__} : {str(e)}")

astrbot/core/pipeline/process_stage/method/llm_request.py:288

[nitpick] The log message contains an informal emoticon, which might be inappropriate for production logs. Consider using a more neutral tone.

logger.info(f"将在 {self.retry_delay} 秒后重试 LLM 请求 ＞﹏＜")

astrbot/core/pipeline/process_stage/method/llm_request.py

Soulter · 2025-04-22T16:17:24Z

感觉这个 process() 方法现在已经过分复杂了（）我抽空仔细 check 一下吧~ 感觉重试机制得用装饰器来包装了

anka-afk added 2 commits April 20, 2025 12:47

feat: 😽增加重试机制, 解决模型不稳定的报错

13fe424

fix: 🤔修复default漏掉的配置

77bfc20

This was linked to issues Apr 20, 2025

[Feature] 对llm请求添加重试处理 #1227

Open

[Feature]添加对重试系统 #1300

Closed

sourcery-ai bot reviewed Apr 20, 2025

View reviewed changes

Merge branch 'master' into anka-dev

7b8e6c7

anka-afk requested review from Soulter, Raven95676 and Copilot April 21, 2025 02:28

Copilot AI reviewed Apr 21, 2025

View reviewed changes

Raven95676 reviewed Apr 21, 2025

View reviewed changes

astrbot/core/pipeline/process_stage/method/llm_request.py Show resolved Hide resolved

anka-afk added 2 commits April 21, 2025 11:11

Merge branch 'AstrBotDevs:master' into anka-dev

094de0d

refactor: 😸拆分单次请求逻辑

6fe0375

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat: 😽增加重试机制, 解决模型不稳定的报错 #1344

feat: 😽增加重试机制, 解决模型不稳定的报错 #1344

Uh oh!

anka-afk commented Apr 20, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Apr 20, 2025 •

edited

Loading

与 Sourcery 互动

自定义您的体验

获取帮助

Reviewer's Guide by Sourcery

Updated class diagram for provider settings

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Apr 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Soulter commented Apr 22, 2025

Uh oh!

Uh oh!

		@@ -158,94 +169,132 @@ async def process(
		req.session_id = event.unified_msg_origin

		async def requesting(req: ProviderRequest):

Uh oh!

feat: 😽增加重试机制, 解决模型不稳定的报错 #1344

Are you sure you want to change the base?

feat: 😽增加重试机制, 解决模型不稳定的报错 #1344

Uh oh!

Conversation

anka-afk commented Apr 20, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Check

Sourcery 总结

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Apr 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Sourcery 评审者指南

更新后的提供商设置类图

文件级别变更

可能相关的 issue

与 Sourcery 互动

自定义您的体验

获取帮助

Reviewer's Guide by Sourcery

Updated class diagram for provider settings

File-Level Changes

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Apr 20, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Soulter commented Apr 22, 2025

Uh oh!

Uh oh!

anka-afk commented Apr 20, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Apr 20, 2025 •

edited

Loading