fix(tmq): remove epset that is useless by wangmm0220 · Pull Request #35126 · taosdata/TDengine

wangmm0220 · 2026-04-14T02:43:34Z

Description

Issue(s)

Close/close/Fix/fix/Resolve/resolve: Issue Link
https://project.feishu.cn/taosdata_td/feature/detail/6923036499

Checklist

Please check the items in the checklist if applicable.

Is the user manual updated?
Are the test cases passed and automated?
Is there no significant decrease in test coverage?

…odeSubscribeObj Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-04-14T02:43:40Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

This PR primarily updates TMQ subscription/consumer state handling by removing persisted endpoint sets (epset) where they’re no longer needed, and adjusts related audit behavior/tests and documentation.

Changes:

TMQ: remove persisted SEpSet/SMqVgEp from subscription/consumer SDB state, bumping related SDB soft versions and updating rebalance logic to track vgId only.
Audit: adjust audit user/token handling and ensure audit STB is created for audit DBs regardless of auditSaveInSelf, adding CI test coverage for “no token” and “self audit switch off”.
Misc: update client callback to fully clear “table not exist as empty” error state; update TDgpt anomaly-detection docs for multi-column input.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`source/dnode/mnode/impl/src/mndSubscribe.c`	Switch unassigned/assigned vgroup tracking to `int32_t vgId` only; bump subscribe SDB version; remove epset update logic.
`source/dnode/mnode/impl/src/mndConsumer.c`	Switch consumer ep tracking to `vgId` only and build response epsets dynamically; bump consumer SDB version.
`source/dnode/mnode/impl/src/mndDef.c`	Update encode/decode for subscribe/consumer objects to drop epset fields while keeping backward decode compatibility.
`source/dnode/mnode/impl/inc/mndDef.h`	Remove `SMqVgEp` and consumer `ep` field; update structures to store only `vgId`.
`source/dnode/mnode/impl/src/mndDb.c`	Always create audit STB when creating an audit DB (no longer gated by `tsAuditSaveInSelf`).
`source/dnode/mnode/impl/src/mndDnode.c`	Fetch active token for the configured audit user (not hardcoded `"audit"`).
`source/libs/new-stream/src/streamTriggerTask.c`	Adjust trigger history logic (count/session/meta-related behavior).
`source/client/src/clientImpl.c`	Clear request-level error code/message when treating “table not exist” as empty result.
`test/ci/cases.task`	Add the two new component test cases to CI task list.
`test/cases/80-Components/01-Taosd/test_com_taosd_audit.py`	Update audit test to use `cus_audit` user and token creation steps.
`test/cases/80-Components/01-Taosd/test_com_taosd_audit_no_token.py`	New test for audit telemetry with `auditUseToken=0`.
`test/cases/80-Components/01-Taosd/test_com_taosd_self_audit_switch_off.py`	New test for switching `auditSaveInSelf` behavior.
`docs/zh/.../03-ad.md`, `docs/en/.../03-ad.md`, `docs/en/.../index.md`	Document multi-column anomaly detection input behavior and new `self.input_data_lists`.

Comments suppressed due to low confidence (2)

source/libs/new-stream/src/streamTriggerTask.c:11755

readAllData is now hardcoded to true and never modified, so the metadata-accelerated branch (if (!readAllData)) becomes dead code and count-window history checks will always scan full data blocks. Either restore the previous conditional so the optimized path can be used when safe, or delete the unreachable branch to avoid misleading future changes (and consider the performance impact of always scanning).

  SStreamTriggerTask      *pTask = pContext->pTask;
  bool                     readAllData = true;
  bool                     allTableProcessed = false;
  bool                     needFetchData = false;

#define ALIGN_UP(x, b) (((x) + (b) - 1) / (b) * (b))
  while (!allTableProcessed && !needFetchData) {
    if (!readAllData) {
      // use table metadatas to accelerate the count window check
      if (IS_TRIGGER_TIMESTAMP_SORTER_EMPTY(pContext->pSorter)) {
        stTimestampSorterReset(pContext->pSorter);
        pContext->pCurTableMeta = tSimpleHashIterate(pGroup->pTableMetas, pContext->pCurTableMeta, &pContext->tbIter);
        if (pContext->pCurTableMeta == NULL) {
          // actually, it has only one table

source/libs/new-stream/src/streamTriggerTask.c:2600

This change drops STREAM_TRIGGER_COUNT from the histTrigTsIndex initialization special-case. If count-window history blocks still place the trigger timestamp in slot 0 (as before), switching to trigTsIndex can make history logic read the wrong column as the timestamp. Please confirm the slot layout for count-trigger history blocks and keep STREAM_TRIGGER_COUNT in the special-case if its history scan always uses ts at index 0.

  if (pTask->triggerType == STREAM_TRIGGER_SESSION || pTask->triggerType == STREAM_TRIGGER_SLIDING) {
    pTask->histTrigTsIndex = 0;
  } else {
    pTask->histTrigTsIndex = pTask->trigTsIndex;
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Verify that after changing taosd's epset (fqdn + port) and force killing, WAL replay of old rebalance transactions fails until ep.json provides the epset mapping to resolve pending transactions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

JinqingKuang · 2026-04-15T02:55:49Z

Code Review 结果

本次审查覆盖范围：Change Summarizer、Broad Scanner、Performance Reviewer、Maintainability Reviewer、Rule Reviewer 全部完成。

变更概要

本 PR 将 TMQ 订阅/消费者数据模型中的 SArray<SMqVgEp> 替换为 SArray<int32_t>（纯 vgId），彻底移除了持久化存储 epset 的设计。epset 改为在需要时通过 mndAcquireVgroup + mndGetVgroupEpset 动态获取，同时删除了 mndSubActionDecode/mndConsumerActionDecode 中的 tmsgUpdateDnodeEpSet 刷新逻辑。版本号 MND_CONSUMER_VER_NUMBER 和 MND_SUBSCRIBE_VER_NUMBER 均从 3 升至 4，向下兼容逻辑（sver <= 3）正确处理了旧格式的 epSet 字节跳过。

发现 3 个已验证问题：

问题 1 — [Medium/高置信] Correctness：vgroup 获取失败时 consumer 会丢失对应 vg 的订阅

位置：source/dnode/mnode/impl/src/mndConsumer.c，processEachTopicEp 函数

问题：旧代码在 epoch != -1 时，无论 vgroup 是否能被 acquire，都会将该 vg 加入 consumer 的 topic endpoint 列表（使用缓存的 epset）。新代码改为每次都 acquire vgroup，若 mndAcquireVgroup 返回 NULL，该 vg 被直接 continue 跳过，仅打一条 mWarn，不触发任何 rebalance 或重试。在 vnode split / vgroup 迁移的过渡窗口内，mndAcquireVgroup 可能短暂失败，导致 consumer 收到不完整的 endpoint 列表，造成消息消费缺口。

修复方向：若 acquire 失败，可在 mWarn 后将该 vg 写入一个"降级列表"并触发一次 rebalance 检查；或在日志中明确说明该 vg 会在下次 heartbeat 中补充，而不是静默跳过。

问题 2 — [Low-Medium/中置信] Performance：每次 consumer 订阅应答都触发全量 vgroup acquire

位置：source/dnode/mnode/impl/src/mndConsumer.c，processEachTopicEp 函数

问题：旧代码仅在 epoch == -1（consumer 初次连接）时才 acquire vgroup；新代码每次调用 processEachTopicEp 都对所有 assigned vg 做 mndAcquireVgroup（SDB read-lock）。在 consumer 数量多、vgroup 数量多的集群中，每次 heartbeat 响应都会有 N 次 SDB 读锁争用。单次 acquire 虽然是内存操作，但高频累积效果值得关注。

修复方向：考虑恢复类似 epoch 判断的轻量路径——对已建立稳定连接的 consumer 跳过全量 acquire，仅在 epoch 变化时刷新 epset。

问题 3 — [Low/高置信] Testing：新增测试在 CI 中被注释 + taos.cfg 修改后无清理

位置 A：test/ci/cases.task，新增行（#,,y,.,./ci/pytest.sh pytest cases/17-DataSubscription/02-Consume/test_tmq_change_epset.py）

问题 A：覆盖本 PR 核心修复场景（epset 变更后 WAL 重放卡住）的测试被注释掉，不会在 CI 中自动执行，本 PR 的回归保护实际缺失。

位置 B：test/cases/17-DataSubscription/02-Consume/test_tmq_change_epset.py，modifyTaosCfg 方法

问题 B：测试直接读写 taos.cfg 文件，但测试结束（无论成功或失败）时只清理了 /etc/hosts 条目，taos.cfg 的 fqdn/port 修改未还原。若测试框架复用同一 dnode 目录，后续测试会继承被污染的配置。

修复方向：

说明 CI 注释的原因（需要 sudo 权限操作 /etc/hosts），并提供后续启用计划，或重构测试以去掉 sudo 依赖；
在 teardown 或 finally 块中还原 taos.cfg，或在 buildSubscription 前备份原始 cfg 供测试后恢复。

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

wangmm0220 and others added 3 commits April 13, 2026 18:20

fix(tmq): remove epset that is useless

ada5451

fix(tmq): reviewed by ci

ebad056

fix(tmq): add missing closing braces in tDecodeSMqConsumerEp and tDec…

b05bde1

…odeSubscribeObj Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 14, 2026 02:43

wangmm0220 requested review from a team, dapan1121, guanshengliang and zitsen as code owners April 14, 2026 02:43

taosdata-bot Bot added the internal label Apr 14, 2026

Copilot started reviewing on behalf of wangmm0220 April 14, 2026 02:44 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread source/libs/new-stream/src/streamTriggerTask.c

Comment thread test/cases/80-Components/01-Taosd/test_com_taosd_audit_no_token.py

Comment thread test/cases/80-Components/01-Taosd/test_com_taosd_audit_no_token.py

wangmm0220 added 2 commits April 14, 2026 13:39

Merge branch 'main' into feat/6923036499

ceca4bf

Merge branch '3.0' into feat/6923036499

bfb36e6

wangmm0220 closed this Apr 14, 2026

wangmm0220 and others added 2 commits April 14, 2026 15:42

fix(tmq): add test case

113f6ec

wangmm0220 reopened this Apr 14, 2026

JinqingKuang reviewed Apr 15, 2026

View reviewed changes

Comment thread source/dnode/mnode/impl/src/mndConsumer.c Outdated

JinqingKuang reviewed Apr 15, 2026

View reviewed changes

Comment thread test/ci/cases.task

JinqingKuang reviewed Apr 15, 2026

View reviewed changes

Comment thread test/cases/17-DataSubscription/02-Consume/test_tmq_change_epset.py

fix(tmq): reviewed by ai

e25117b

Copilot AI review requested due to automatic review settings April 15, 2026 06:05

Copilot started reviewing on behalf of wangmm0220 April 15, 2026 06:06 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

JinqingKuang approved these changes Apr 16, 2026

View reviewed changes

guanshengliang approved these changes Apr 16, 2026

View reviewed changes

guanshengliang merged commit 891eac1 into 3.0 Apr 16, 2026
13 of 14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tmq): remove epset that is useless#35126

fix(tmq): remove epset that is useless#35126
guanshengliang merged 8 commits into3.0from
feat/6923036499

wangmm0220 commented Apr 14, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JinqingKuang commented Apr 15, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

wangmm0220 commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue(s)

Checklist

Uh oh!

gemini-code-assist Bot commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JinqingKuang commented Apr 15, 2026

Code Review 结果

变更概要

问题 1 — [Medium/高置信] Correctness：vgroup 获取失败时 consumer 会丢失对应 vg 的订阅

问题 2 — [Low-Medium/中置信] Performance：每次 consumer 订阅应答都触发全量 vgroup acquire

问题 3 — [Low/高置信] Testing：新增测试在 CI 中被注释 + taos.cfg 修改后无清理

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wangmm0220 commented Apr 14, 2026 •

edited

Loading