Skip to content

POC reworking how we handle tags within mysql integration #20390

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

eric-weaver
Copy link
Contributor

What does this PR do?

Putting this up as a draft example of how I feel we can rework the logic on how we handle tags on the mysql integration. The goal here is to allow for more efficient handling of our tag lists, currently split between MysqlCheck.tags and MysqlCheck._non_internal_tags.

TagManager Class Summary

The TagManager class manages tags that can be either key-value pairs (e.g., env:prod) or keyless (e.g., linux). It supports both regular and internal tags and uses caching for efficient access.


Internal Data Structures

  • self._tags: Stores regular tags as a dictionary: {key: [value1, value2, ...]}.
  • self._internal_tags: Stores internal-only tags with the same structure.
  • self._cached_tag_list: Cached list of formatted regular tag strings.
  • self._cached_internal_tag_list: Cached list of formatted internal tag strings.

Core Methods

  • set_tag(key, value, replace=False, internal=False)
    Adds a tag.

    • If key is None or empty, stores it as a keyless tag.
    • If replace=True, replaces existing values.
    • If internal=True, stores it in the internal tag dictionary.
    • On tag write we wipe the cached representation
  • set_tags_from_list(tag_list, replace=False)
    Adds multiple tags from a list of strings, internally calling set_tag on each.
    Strings can be in "key:value" or "value" (keyless) format.

  • delete_tag(key, value=None, internal=False)
    Deletes a tag or a specific value.

    • If value is None, removes all values for the key.
    • Keyless tags use None or "" as the key.
  • get_tags(include_internal=True)
    Returns a sorted list of all tag strings.

    • Key-value tags are formatted as "key:value".
    • Keyless tags are included as raw values.
    • Includes internal tags if include_internal=True.

Data Relationships

  • Tags are stored separately as regular (_tags) and internal (_internal_tags).
  • Keyless tags are stored using the TagType.KEYLESS enum as the key.
  • Caching is used to avoid regenerating the tag string list unnecessarily.

Motivation

Currently it's quite difficult to work with our tags. We keep track of two lists of tags MysqlCheck.tags and MysqlCheck._non_internal_tags. The latter is only used to avoid adding dd.internal* tags to database_instance events. Since we only store tags in a list today we have a mix of complex logic to search the list and delete a tag when updating mutable values between check runs as well as bugs that often result in tags not updating or removing properly. This lays a foundation to better handle these growing use case patterns. Storing tags in a hashmap structure allows for easily accessing single and multi value tags. The larger goal of this would be to open a new PR adding TagManager into our base utility class and have it used across all of our dbms integrations to better encapsulate tags and their life cycle

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Add the qa/skip-qa label if the PR doesn't need to be tested during QA.
  • If you need to backport this PR to another branch, you can add the backport/<branch-name> label to the PR and it will automatically open a backport PR once this one is merged

@@ -803,6 +803,7 @@ def test_database_instance_metadata(aggregator, dd_run_check, instance_complex,
expected_tags = tags.METRIC_TAGS + [
"database_hostname:{}".format(expected_database_hostname),
"database_instance:{}".format(expected_database_instance),
"dbms_flavor:{}".format(MYSQL_FLAVOR.lower()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an actual bug in our current logic. We set dbms_flavor tag on all of our metrics/events except for database_instance event. So then we also miss it on any cloud provider dynamic metric linking as well. This would fix that

# sort tags to ensure consistent ordering
tags.sort()
# Get sorted tags from tag manager and make a copy to avoid mutation
tags = sorted(self.tag_manager.get_tags())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should get_tags just always return a sorted list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! It actually already is here. Good catch. Had left this initially as sorted() returns a new copy. But will circle back and verify what sort of copy we need to perform to avoid mutation. Perhaps in the get_tags returned value in general

@@ -384,7 +375,7 @@ def check(self, _):
self.check_userstat_enabled(db)

# Metric collection
tags = copy.deepcopy(self.tags)
tags = copy.deepcopy(self.tag_manager.get_tags())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these need to be copied still? I expect we're doing it to avoid mutations but if tag manager always returns a copy then we don't have to copy here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully agree, good eye. This is on my list to iron out after the approach is validated as a direction we want to go

Copy link

codecov bot commented May 27, 2025

Codecov Report

Attention: Patch coverage is 92.89617% with 13 lines in your changes missing coverage. Please review.

Project coverage is 90.13%. Comparing base (97108bc) to head (2cfb2cd).
Report is 51 commits behind head on master.

Additional details and impacted files
Flag Coverage Δ
activemq ?
cassandra ?
confluent_platform ?
hive ?
hivemq ?
hudi ?
ignite ?
jboss_wildfly ?
kafka ?
mysql 89.41% <92.89%> (+0.25%) ⬆️
presto ?
solr ?
tomcat ?
weblogic ?

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@eric-weaver eric-weaver deleted the eric.weaver/DBMON-5356 branch June 3, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants