Skip to content

feat(webui): Add dataset name validation and timestamp key docs link in compression form (resolves #1703).#1707

Merged
davemarco merged 9 commits intoy-scope:mainfrom
davemarco:dataset_validation
Dec 1, 2025
Merged

feat(webui): Add dataset name validation and timestamp key docs link in compression form (resolves #1703).#1707
davemarco merged 9 commits intoy-scope:mainfrom
davemarco:dataset_validation

Conversation

@davemarco
Copy link
Contributor

@davemarco davemarco commented Dec 1, 2025

Description

Implement suggestions in #1703 - Adds validation to dataset to match package + link for escaped characters.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

validation works as expected. Link opens in new browser

Summary by CodeRabbit

  • New Features

    • Centralized SQL table suffixes and a default table prefix exposed for consistent behavior across the UI.
    • Added and exported dataset-name schema, pattern and max-length computation to align client validation with backend rules.
  • Bug Fixes / UX

    • Added inline validation for dataset names with enforced character and length rules to prevent invalid submissions.
    • Improved TIMESTAMP KEY helper text with a direct link to JSON search syntax documentation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Walkthrough

Removed a local SqlTableSuffix and centralized it in common config; added CLP_DEFAULT_TABLE_PREFIX, dataset-name schema/constants and client-side validator; integrated the validator into the compression form and updated helper text and import sites.

Changes

Cohort / File(s) Summary
Client-local config removed / centralised
components/webui/client/src/config/sql-table-suffix.ts, components/webui/common/src/config.ts
Deleted the client-local SqlTableSuffix enum; added SqlTableSuffix and CLP_DEFAULT_TABLE_PREFIX = "clp_" to components/webui/common/src/config.ts and exported them.
Validation & schema
components/webui/client/src/pages/IngestPage/Compress/validation.ts, components/webui/common/src/schemas/compression.ts
Added `validateDatasetName(datasetName: string): string
Form UI integration
components/webui/client/src/pages/IngestPage/Compress/ClpSFormItems.tsx
Integrated validateDatasetName as a custom antd Form.Item validator for Dataset; imported Typography and replaced the TIMESTAMP KEY helper text with a JSX fragment including an inline link to JSON search syntax docs.
SQL import updates
components/webui/client/src/pages/IngestPage/Details/sql.ts, components/webui/client/src/pages/IngestPage/SpaceSavings/sql.ts, components/webui/client/src/pages/SearchPage/SearchControls/TimeRangeInput/sql.ts, components/webui/client/src/pages/SearchPage/SearchState/Presto/useTimestampKeyInit/sql.ts
Replaced relative/local imports of SqlTableSuffix with imports from @webui/common/config; usages remain unchanged.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Verify SqlTableSuffix members and CLP_DEFAULT_TABLE_PREFIX match database/table naming expectations.
  • Confirm DATASET_NAME_MAX_LEN calculation correctly accounts for prefix, separator and the longest suffix.
  • Review DatasetNameSchema regex and validateDatasetName() for allowed characters, empty handling and clear messages.
  • Check antd Form.Item validator integration in ClpSFormItems.tsx for UX/error behaviour.
  • Ensure all updated imports compile and no stale relative imports remain.

Possibly related issues

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title accurately summarizes the main changes: adding dataset name validation and a timestamp key documentation link in the compression form.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@davemarco davemarco requested a review from junhaoliao December 1, 2025 13:51
@davemarco davemarco changed the title fear(webui): Document/validate nuances of dataset name and timestamp key in compression job submission form (resolves #1703). feat(webui): Document/validate nuances of dataset name and timestamp key in compression job submission form (resolves #1703). Dec 1, 2025
@davemarco davemarco marked this pull request as ready for review December 1, 2025 15:09
@davemarco davemarco requested a review from a team as a code owner December 1, 2025 15:09
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6d7a58 and 4cea1e7.

📒 Files selected for processing (3)
  • components/webui/client/src/config/sql-table-suffix.ts (1 hunks)
  • components/webui/client/src/pages/IngestPage/Compress/ClpSFormItems.tsx (2 hunks)
  • components/webui/client/src/pages/IngestPage/Compress/validation.ts (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/webui/client/src/pages/IngestPage/Compress/validation.ts
  • components/webui/client/src/pages/IngestPage/Compress/ClpSFormItems.tsx
  • components/webui/client/src/config/sql-table-suffix.ts
🧠 Learnings (1)
📓 Common learnings
Learnt from: haiqi96
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:188-212
Timestamp: 2025-07-03T12:59:46.638Z
Learning: In CLP schedulers (compression and query), runtime dataset validation is required; the helper validate_and_cache_dataset keeps a local cache and only queries the DB on cache misses because dataset additions are rare.
🔇 Additional comments (2)
components/webui/client/src/config/sql-table-suffix.ts (1)

1-12: Enum extension for metadata table suffixes looks consistent

The additional suffixes (ARCHIVE_TAGS, DATASETS, FILES, TAGS) and the doc comment tying this to clp_metadata_db_utils.py keep the WebUI in sync with the backend naming. This works well with the later use of Object.values(SqlTableSuffix) in the validation utilities.

components/webui/client/src/pages/IngestPage/Compress/validation.ts (1)

43-59: Align validateDatasetName with code style guidelines; review type safety needs

Two points here:

  1. Coding style guideline compliance
    The file matches the TypeScript pattern requiring false == <expression> instead of !<expression>. Current code uses !datasetName and !(/^\w+$/).test(datasetName). Replace these with the preferred pattern:

  • if (!datasetName) {
  • if (false == datasetName || 0 === datasetName.length) {
    return null;
    }

```diff
-    if (!(/^\w+$/).test(datasetName)) {
+    const isAlphanumeric = (/^\w+$/).test(datasetName);
+    if (false == isAlphanumeric) {
      return "Dataset name can only contain alphanumeric characters and underscores.";
  }
  1. Type signature review
    Verify whether callers pass potentially undefined/null values. If the form validation passes unknown types, consider accepting string | null | undefined to make the function safer and more self-documenting. Otherwise, the current string type is appropriate.

Copy link
Member

@junhaoliao junhaoliao Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing this only in the client, can we update the Typebox schema directly so validations can be performed in the backend as well? e.g.,

/**
 * Matching the `MYSQL_TABLE_NAME_MAX_LEN` in `clp_py_utils.clp_metadata_db_utils`.
 */
const MYSQL_TABLE_NAME_MAX_LEN = 64;

/**
 * Maximum length among all table suffixes.
 */
const TABLE_SUFFIX_MAX_LEN = Math.max(
    ...Object.values(SqlTableSuffix).map((suffix) => suffix.length)
);

/**
 * Matching the default `clp_table_prefix` in `clp_py_utils.clp_config`.
 */
const CLP_DEFAULT_TABLE_PREFIX = "clp_";

/**
 * Dataset name validation constants matching `clp_package_utils.general.validate_dataset_name`.
 * - Pattern: only alphanumeric characters and underscores.
 * - Max length: computed using the default table prefix.
 */
const DATASET_NAME_PATTERN = "^\\w+$";
const DATASET_NAME_SEPARATOR_LEN = 1;
const DATASET_NAME_MAX_LEN =
    MYSQL_TABLE_NAME_MAX_LEN -
    CLP_DEFAULT_TABLE_PREFIX.length -
    DATASET_NAME_SEPARATOR_LEN -
    TABLE_SUFFIX_MAX_LEN;

/**
 * Schema for request to create a new compression job.
 */
const CompressionJobCreationSchema = Type.Object({
    paths: Type.Array(Type.String()),
    dataset: Type.Optional(Type.String({
        pattern: DATASET_NAME_PATTERN,
        maxLength: DATASET_NAME_MAX_LEN,
    })),
    timestampKey: Type.Optional(Type.String()),
});

Copy link
Contributor Author

@davemarco davemarco Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i thought of this, but the CLP_DEFAULT_TABLE_PREFIX is actually variable, like it is configurable. which could complicate implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just the pattern and not max length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i made the change

davemarco and others added 2 commits December 1, 2025 14:35
…Items.tsx

Co-authored-by: Junhao Liao <junhao@junhao.ca>
…Items.tsx

Co-authored-by: Junhao Liao <junhao@junhao.ca>
@davemarco davemarco requested a review from junhaoliao December 1, 2025 19:38
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4cea1e7 and e2ed73b.

📒 Files selected for processing (1)
  • components/webui/client/src/pages/IngestPage/Compress/ClpSFormItems.tsx (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/webui/client/src/pages/IngestPage/Compress/ClpSFormItems.tsx
🧠 Learnings (2)
📓 Common learnings
Learnt from: haiqi96
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:188-212
Timestamp: 2025-07-03T12:59:46.638Z
Learning: In CLP schedulers (compression and query), runtime dataset validation is required; the helper validate_and_cache_dataset keeps a local cache and only queries the DB on cache misses because dataset additions are rare.
📚 Learning: 2024-11-21T15:51:33.203Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.

Applied to files:

  • components/webui/client/src/pages/IngestPage/Compress/ClpSFormItems.tsx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (2)
components/webui/client/src/pages/IngestPage/Compress/ClpSFormItems.tsx (2)

1-9: Imports are correct and necessary.

The added Typography import supports the enhanced helper text with an external documentation link, and validateDatasetName enables inline form validation.


42-53: Validator logic is sound.

The validation correctly delegates to validateDatasetName and surfaces errors to the form. The existing review comment about simplifying the async pattern and improving type safety remains valid.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2ed73b and e692da5.

📒 Files selected for processing (8)
  • components/webui/client/src/config/sql-table-suffix.ts (0 hunks)
  • components/webui/client/src/pages/IngestPage/Compress/validation.ts (1 hunks)
  • components/webui/client/src/pages/IngestPage/Details/sql.ts (1 hunks)
  • components/webui/client/src/pages/IngestPage/SpaceSavings/sql.ts (1 hunks)
  • components/webui/client/src/pages/SearchPage/SearchControls/TimeRangeInput/sql.ts (1 hunks)
  • components/webui/client/src/pages/SearchPage/SearchState/Presto/useTimestampKeyInit/sql.ts (1 hunks)
  • components/webui/common/src/config.ts (1 hunks)
  • components/webui/common/src/schemas/compression.ts (2 hunks)
💤 Files with no reviewable changes (1)
  • components/webui/client/src/config/sql-table-suffix.ts
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/webui/client/src/pages/IngestPage/Compress/validation.ts
  • components/webui/client/src/pages/IngestPage/Details/sql.ts
  • components/webui/common/src/config.ts
  • components/webui/client/src/pages/SearchPage/SearchControls/TimeRangeInput/sql.ts
  • components/webui/common/src/schemas/compression.ts
  • components/webui/client/src/pages/IngestPage/SpaceSavings/sql.ts
  • components/webui/client/src/pages/SearchPage/SearchState/Presto/useTimestampKeyInit/sql.ts
🧠 Learnings (3)
📓 Common learnings
Learnt from: haiqi96
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-07-29T14:04:13.769Z
Learning: User haiqi96 requested creating a GitHub issue to document a bug fix from PR #1136, which addressed MySQL compatibility issues with invalid SQL CAST operations in the WebUI component.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:188-212
Timestamp: 2025-07-03T12:59:46.638Z
Learning: In CLP schedulers (compression and query), runtime dataset validation is required; the helper validate_and_cache_dataset keeps a local cache and only queries the DB on cache misses because dataset additions are rare.
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1036
File: components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py:204-211
Timestamp: 2025-07-03T12:58:18.407Z
Learning: In the CLP codebase, the validate_and_cache_dataset function in components/clp-py-utils/clp_py_utils/clp_metadata_db_utils.py uses in-place updates of the existing_datasets set parameter rather than returning a new set, as preferred by the development team.
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:90-96
Timestamp: 2024-11-15T16:22:33.635Z
Learning: In the CLP codebase, `table_prefix` is a trusted configuration value and can be safely embedded into SQL queries.
📚 Learning: 2024-11-21T15:51:33.203Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 596
File: components/log-viewer-webui/client/src/api/query.js:16-23
Timestamp: 2024-11-21T15:51:33.203Z
Learning: In `components/log-viewer-webui/client/src/api/query.js`, the `ExtractJsonResp` type definition is accurate as-is and does not require modification. When suggesting changes to type definitions, ensure they align with the server-side definitions, referencing the source code if necessary.

Applied to files:

  • components/webui/client/src/pages/IngestPage/Compress/validation.ts
📚 Learning: 2024-11-15T16:22:33.635Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 594
File: components/clp-package-utils/clp_package_utils/scripts/native/del_archives.py:90-96
Timestamp: 2024-11-15T16:22:33.635Z
Learning: In the CLP codebase, `table_prefix` is a trusted configuration value and can be safely embedded into SQL queries.

Applied to files:

  • components/webui/client/src/pages/IngestPage/Compress/validation.ts
  • components/webui/client/src/pages/IngestPage/Details/sql.ts
  • components/webui/common/src/config.ts
  • components/webui/client/src/pages/IngestPage/SpaceSavings/sql.ts
  • components/webui/client/src/pages/SearchPage/SearchState/Presto/useTimestampKeyInit/sql.ts
🧬 Code graph analysis (1)
components/webui/client/src/pages/IngestPage/Compress/validation.ts (1)
components/webui/common/src/schemas/compression.ts (2)
  • DatasetNameSchema (70-70)
  • DATASET_NAME_MAX_LEN (68-68)
🔇 Additional comments (6)
components/webui/common/src/config.ts (1)

31-47: SqlTableSuffix and table-prefix centralisation looks sound

Enum values and the default prefix are clearly documented to mirror the Python config/metadata utilities, and exporting them here gives a single source of truth for downstream SQL builders. No issues from a correctness or maintainability perspective.

Also applies to: 50-56

components/webui/client/src/pages/IngestPage/Details/sql.ts (1)

1-1: Using shared SqlTableSuffix from common config is appropriate

Switching to the central @webui/common/config export keeps table-suffix usage consistent across the UI without changing the query logic.

components/webui/client/src/pages/SearchPage/SearchState/Presto/useTimestampKeyInit/sql.ts (1)

1-1: Central SqlTableSuffix import keeps Presto timestamp-key SQL aligned

Refactoring to import SqlTableSuffix from the common config removes the local dependency while preserving existing query behaviour.

components/webui/client/src/pages/SearchPage/SearchControls/TimeRangeInput/sql.ts (1)

1-4: Shared config imports for CLP storage engine and suffixes look correct

Pulling CLP_STORAGE_ENGINES and SqlTableSuffix from @webui/common/config makes this time-range query builder consistent with other modules and keeps engine/suffix definitions centralised.

components/webui/client/src/pages/IngestPage/SpaceSavings/sql.ts (1)

1-1: SqlTableSuffix import refactor maintains existing space-savings behaviour

Using SqlTableSuffix from the shared config keeps the archive-table naming consistent with other parts of the UI while leaving the query logic unchanged.

components/webui/common/src/schemas/compression.ts (1)

6-43: Verify dataset-name schema alignment with backend validators for non-default table prefixes

Defining DATASET_NAME_PATTERN, DATASET_NAME_MAX_LEN, and DatasetNameSchema here provides a clear single source for dataset-name rules. Using SqlTableSuffix and CLP_DEFAULT_TABLE_PREFIX to derive max length from MySQL's table-name limit is sensible.

However, this computation uses the hardcoded default table prefix, while clp_table_prefix is configurable at runtime in Python. Backend validators (clp_package_utils.general.validate_dataset_name and schedulers' validate_and_cache_dataset) may validate based on the effective runtime prefix. If non-default prefixes are deployed, a dataset name could pass this schema but fail backend validation or cause table-name issues.

Confirm that either:

  • Deployed environments always use the default prefix, or
  • Backend validators use the same default-prefix-based calculation, or
  • The effective prefix will be configurable in this module

Backend schedulers already perform runtime validation (learnings confirm), so keeping rules aligned will prevent UI/scheduler validation mismatches.

Comment on lines +44 to +51
validator: async (_, value: unknown) => {
const error = validateDatasetName(value as string);
if (error) {
return Promise.reject(new Error(error));
}

return Promise.resolve();
},
Copy link
Member

@junhaoliao junhaoliao Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed in the other PR, i think the explicit usages of Promise here are "redundantly correct" - they work, but the async keyword already handles the Promise wrapping

maybe it's better to remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay i changed

@davemarco davemarco requested a review from junhaoliao December 1, 2025 23:15
Copy link
Member

@junhaoliao junhaoliao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the title, how about

feat(webui): Add dataset name validation and timestamp key docs link in compression form (resolves #1703).

@davemarco davemarco changed the title feat(webui): Document/validate nuances of dataset name and timestamp key in compression job submission form (resolves #1703). feat(webui): Add dataset name validation and timestamp key docs link in compression form (resolves #1703). Dec 1, 2025
@davemarco davemarco merged commit a8a11dc into y-scope:main Dec 1, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants