Skip to content

fix(tmq): optimize poll logic & vtable meta error & dead lock in sometimes#35086

Merged
guanshengliang merged 7 commits intomainfrom
fix/6948264296
Apr 9, 2026
Merged

fix(tmq): optimize poll logic & vtable meta error & dead lock in sometimes#35086
guanshengliang merged 7 commits intomainfrom
fix/6948264296

Conversation

@wangmm0220
Copy link
Copy Markdown
Contributor

Description

Issue(s)

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copilot AI review requested due to automatic review settings April 8, 2026 09:59
@taosdata-bot taosdata-bot Bot added the internal label Apr 8, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces lock wrapper functions in clientTmq.c to incorporate trace logging and replaces direct latch calls. It also adds a getElapsedTime helper and refactors tmq_consumer_poll. In catalog.c, it updates ctgGetTbMeta to correctly manage schemaExt and tagRef during metadata reallocation. Feedback suggests renaming the new lock functions to snake_case to maintain consistency with the existing codebase.

Comment thread source/client/src/clientTmq.c Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets TMQ stability and metadata correctness by tightening virtual table meta reconstruction in the catalog layer and adjusting TMQ client poll/response handling to reduce intermittent deadlocks.

Changes:

  • Fix virtual-table meta rebuild to correctly account for schemaExt and tag references when reallocating/copying meta buffers.
  • Add TMQ lock wrapper helpers with TRACE logging and use them across multiple TMQ code paths.
  • Simplify processMqRsp control flow by returning early for EP responses / decode errors to avoid incorrect unlock paths.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
source/libs/catalog/src/catalog.c Rebuilds vtable meta with schemaExt + colRef + tagRef layout and sets pointers consistently.
source/client/src/clientTmq.c Introduces lock wrappers + TRACE macro, replaces direct latch calls, and tweaks poll/response flow to avoid deadlock-prone paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/client/src/clientTmq.c Outdated
Comment thread source/libs/catalog/src/catalog.c
@JinqingKuang
Copy link
Copy Markdown
Contributor

Code Review

本次审查覆盖 source/client/src/clientTmq.csource/libs/catalog/src/catalog.c,共发现 2 个问题,同时确认了 PR 中 2 项关键 bug 修复的正确性。


✅ 确认正确的修复

1. processMqRsp 死锁修复
旧代码在 taosWLockLatch 调用之前存在两处 goto END,而 END: 标签处无条件执行 taosWUnLockLatch,构成 release-without-acquire,会腐败 RWLatch 内部计数器,最终触发死锁。PR 将这两处改为 return code,修复正确。这是 PR 标题中"dead lock in sometimes"的根源。

2. catalog.c vtable meta 内存布局修复
CTG_IS_META_VCTABLE 路径存在三处缺陷:

  • realloc 未计入 tagRefSize,buffer 偏小
  • colRef 指针放在 metaSize 处,与 realloc 保留的 schemaExt 内联数据区域重叠,写入 colRef 数据时会破坏 schemaExt
  • tagRef 完全未初始化

PR 对这三点均已正确修复,并在 realloc 移动地址后补充了 schemaExt 指针更新。


⚠️ 需要关注的问题

详见下方 inline comments。


🤖 Reviewed by Claude Code (claude-sonnet-4-6)

Comment thread source/client/src/clientTmq.c Outdated
Comment thread source/client/src/clientTmq.c Outdated
Copilot AI review requested due to automatic review settings April 9, 2026 06:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

source/libs/catalog/src/catalog.c:251

  • Potential use-after-free/out-of-bounds read: output->vctbMeta is produced via ctgCloneMetaOutput in ctgRefreshTbMeta, but that clone routine only allocates/copies colRef and does not deep-copy the tagRef array. As a result, output->vctbMeta->tagRef may point outside the cloned allocation or to memory owned by the enqueued original output, and dereferencing it here can race with cache-update/free. Fix by updating the clone logic to allocate/copy tagRef (and set tagRef=NULL + numOfTagRefs=0 when absent) so output->vctbMeta is self-contained before this memcpy.
        TAOS_MEMCPY(output->tbMeta->colRef, output->vctbMeta->colRef, colRefSize);
        if (output->vctbMeta->tagRef && tagRefSize > 0) {
          output->tbMeta->tagRef = (SColRef *)((char *)output->tbMeta + metaSize + schemaExtSize + colRefSize);
          TAOS_MEMCPY(output->tbMeta->tagRef, output->vctbMeta->tagRef, tagRefSize);
        } else {
          output->tbMeta->tagRef = NULL;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/libs/catalog/src/catalog.c
Comment thread source/client/src/clientTmq.c
@guanshengliang guanshengliang merged commit 0ea4a3b into main Apr 9, 2026
15 of 16 checks passed
@tomchon tomchon deleted the fix/6948264296 branch April 17, 2026 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants