Skip to content

enh(tmq): optimize logic in altering tag with tmq#34711

Merged
guanshengliang merged 1 commit intomainfrom
fix/6765905149-main
Mar 11, 2026
Merged

enh(tmq): optimize logic in altering tag with tmq#34711
guanshengliang merged 1 commit intomainfrom
fix/6765905149-main

Conversation

@wangmm0220
Copy link
Contributor

@wangmm0220 wangmm0220 commented Mar 9, 2026

…tags in tmq

Description

Issue(s)

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copilot AI review requested due to automatic review settings March 9, 2026 01:11
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on optimizing the performance and stability of updating table lists within the TDengine Message Queue (TMQ) when tags are altered. It achieves this by introducing a defensive copy of table UID lists in the core C logic, preventing unintended modifications. Concurrently, the associated system test has been substantially upgraded to simulate high-frequency tag alteration scenarios with a significantly larger dataset and multi-threaded operations, ensuring robust validation of the implemented optimizations.

Highlights

  • Core Logic Optimization: Implemented a defensive copy mechanism for table UID lists (tbUidList) in the TMQ read path (tqRead.c) to prevent potential issues or improve concurrency when altering tags. This ensures the list used for filtering and adding tables is isolated and properly managed.
  • Enhanced TMQ Alter Tag Test: Significantly scaled up the tmq_alter_tag_frequency.py system test by increasing the number of child tables and insert rows, introducing multi-threaded tag alteration, and adding signal-based control for test execution. This aims to thoroughly test the performance and stability of tag alteration under heavy load.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • source/dnode/vnode/src/tq/tqRead.c
    • Introduced copyUidList function to create a deep copy of tbUidList.
    • Modified addTableListForStableTmq to use the copied list for qFilterTableList and tqReaderAddTbUidList.
    • Ensured proper cleanup of the copied list using taosArrayDestroy via a goto END pattern.
  • tests/system-test/7-tmq/tmq_alter_tag_frequency.py
    • Increased childtable_count to 1,000,000 and insert_rows to 10,000 in test configuration.
    • Added signal module and handlers (SIGUSR1, SIGUSR2, SIGINT) for dynamic control of alter threads (pause, resume, stop).
    • Implemented stopAlter boolean and pauseAlter threading.Event for thread synchronization.
    • Refactored consume method to accept a group parameter, allowing multiple consumer groups, and added progress logging.
    • Modified alterTag to run in a continuous loop, using a dedicated tdSqlTmp instance per thread, and to restart altering from the beginning once all tables in its range are processed.
    • Updated run method to launch multiple alter threads, two consumer threads, and an insert thread, with proper join calls for synchronization.
    • Adjusted tableNum to 1,000,000 to match test configuration.
    • Changed auto_create_table to "no", batch_create_tbl_num to 1000, and insert_mode to "taosc" in the test configuration.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to optimize the speed of updating table lists in TMQ when altering tags. The core change in the C code involves creating a copy of the table UID list before processing, which prevents side effects on the original list and improves correctness, especially in concurrent scenarios. The Python test has been substantially updated to a multi-threaded stress test, which is a great addition for verifying the fix under load. I've identified a few areas for improvement in the test script to enhance its correctness and reliability.

continue
for block in val:
print(block.fetchall(),len(block.fetchall()))
cnt += len(block.fetchall())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fetchall() method on a result block is stateful. It consumes the rows from the block and moves an internal cursor. Subsequent calls to fetchall() on the same block will likely return an empty list.

Using len(block.fetchall()) directly is not robust. It's better practice to store the result of fetchall() in a variable and then get its length to avoid potential issues.

For example:

rows = block.fetchall()
cnt += len(rows)

Comment on lines +203 to 206
if tableIndex >= endTable:
tableIndex = startTable
print(f"[Thread {threadId}] Restart alter table tag from {startTable}")
alterSql = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the alterTag method, when tableIndex >= endTable, the loop restarts and alterSql is reset. If the number of tables in the range (endTable - startTable) is not a multiple of 10, the last few ALTER statements that have been concatenated into alterSql but not yet executed will be lost. You should execute the remaining statements in alterSql before resetting it and restarting the loop.

Suggested change
if tableIndex >= endTable:
tableIndex = startTable
print(f"[Thread {threadId}] Restart alter table tag from {startTable}")
alterSql = ''
if tableIndex >= endTable:
if alterSql:
try:
tdSqlTmp.execute(alterSql)
except Exception as e:
tdLog.warning(f"[Thread {threadId}] Alter failed: {e}")
tableIndex = startTable
print(f"[Thread {threadId}] Restart alter table tag from {startTable}")
alterSql = ''

Comment on lines 243 to +254
for i in range(alterCnt):
alterThread.append(threading.Thread(target=self.alterTag, kwargs={"start": i * batchAlterNum, "end": (i + 1) * batchAlterNum}))
alterThread[i].start()
tdSqlTmp = TDSql()
tdSqlTmp.init(self.conn.cursor())
self.tdSqlList.append(tdSqlTmp)
thread = threading.Thread(
target=self.alterTag,
kwargs={"start": i * batchAlterNum, "end": (i + 1) * batchAlterNum, "tdSql": tdSqlTmp},
name=f"AlterThread-{i}"
)
self.alterThread.append(thread)
thread.start()
tdLog.info(f"Started AlterThread-{i}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In the run method, you are creating multiple alterTag threads. Each thread gets a TDSql object initialized with self.conn.cursor(). According to the Python DB-API 2.0 specification, connection objects are generally not thread-safe, and using cursors from the same connection across multiple threads can lead to race conditions.

While the taos-python driver might have internal locking to handle this, it's safer and better practice for each thread to have its own database connection. Consider creating a new taos.connect(...) within each alterTag thread to ensure thread safety.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to improve TMQ performance when tag-alter operations cause frequent updates to the internal “qualified table list” for stable subscriptions, and updates the related system test to stress/observe the behavior under heavier load and concurrency.

Changes:

  • In vnode TQ read path, avoid in-place mutation of the shared table UID list by filtering a copied list per consumer handle before adding to the reader.
  • Expand the TMQ alter-tag frequency system test to larger scale and multi-consumer + multi-alter-thread execution, with optional signal-based pause/stop controls.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
tests/system-test/7-tmq/tmq_alter_tag_frequency.py Scales up workload, adds multiple consumers and per-thread SQL executors, and adds signal-driven control of alter threads.
source/dnode/vnode/src/tq/tqRead.c Copies UID list before qFilterTableList() to prevent side effects across handles and then adds filtered UIDs to the reader.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

self.stopAlter = True
self.pauseAlter.set()
for tdSqlTmp in self.tdSqlList:
tdSqlTmp.close()
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stop() no longer closes the global tdSql cursor created in init() (it used to call tdSql.close()). This can leak a cursor/statement handle across system tests; consider closing tdSql in addition to closing the per-thread tdSqlTmp instances.

Suggested change
tdSqlTmp.close()
tdSqlTmp.close()
tdSql.close()

Copilot uses AI. Check for mistakes.
Comment on lines 141 to 144
try:
consumer.subscribe(["t0"])
except TmqError:
tdLog.exit(f"subscribe error")
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consume() catches TmqError, but this file only imports Consumer from taos.tmq. Unless TmqError is injected by another wildcard import, this will raise NameError the first time subscribe fails. Import TmqError explicitly (e.g., from taos.tmq import Consumer, TmqError) or catch the concrete exception type you intend.

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +104
# 注册信号处理器
signal.signal(signal.SIGUSR1, self.signalHandler) # 暂停/恢复 alter 线程
signal.signal(signal.SIGUSR2, self.signalHandler) # 停止 alter 线程
signal.signal(signal.SIGINT, self.signalHandler) # Ctrl+C 停止所有
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is registered for Windows, but SIGUSR1/SIGUSR2 are not available on Windows' signal module, so signal.signal(signal.SIGUSR1/2, ...) will raise at runtime. Please guard these registrations (e.g., only on os.name == 'posix' or hasattr(signal, 'SIGUSR1')), and keep SIGINT handling cross-platform.

Copilot uses AI. Check for mistakes.
@wangmm0220 wangmm0220 changed the title fix(taosd): optimize the speed of updating table lists when altering tags in tmq enh(tmq): optimize logic in altering tag with tmq Mar 9, 2026
@JinqingKuang
Copy link
Contributor

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

🤖 Generated with Claude Code

- If this code review was useful, please react with 👍. Otherwise, react with 👎.

@guanshengliang guanshengliang merged commit 30e3a46 into main Mar 11, 2026
23 of 26 checks passed
@guanshengliang guanshengliang deleted the fix/6765905149-main branch March 11, 2026 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants