feat(clp-package): Add clp-config.yaml JSON schema generation and package the config schema.#1654
Conversation
WalkthroughChanges introduce automated JSON Schema generation for CLP configuration files. A new Python script generates schemas from the Pydantic CLP config model, integrated into the build system as a new task. The generated schema is referenced in the configuration file via a language server directive, and packaged as an artifact. Changes
Sequence Diagram(s)sequenceDiagram
participant Build as Build Task
participant Generate as generate-config-schemas
participant Script as generate-config-schemas.py
participant Schema as JSON Schema
participant Package as package Task
Build->>Generate: execute task
Generate->>Generate: create OUTPUT_DIR
Generate->>Script: invoke with OUTPUT_DIR
Script->>Script: load ClpConfig model
Script->>Script: generate schema via GenerateJsonSchemaWithSchemaKey
Script->>Schema: inject $schema field
Script->>Schema: write clp-config.schema.json
Build->>Package: dependency ready
Package->>Package: copy schema to usr/share/config-schemas/
Package->>Package: include in artifact
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
clp-config.yaml.clp-config.yaml JSON schema generation and package the config schema.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/package-template/src/etc/clp-config.yaml(1 hunks)taskfile.yaml(5 hunks)tools/scripts/generate-config-schemas.py(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Learnt from: gibber9809
Repo: y-scope/clp PR: 955
File: components/core/src/clp_s/search/sql/CMakeLists.txt:8-26
Timestamp: 2025-06-02T18:22:24.060Z
Learning: In the clp project, ANTLR code generation at build time is being removed by another PR. When reviewing CMake files, be aware that some temporary suboptimal configurations may exist to reduce merge conflicts between concurrent PRs, especially around ANTLR_TARGET calls.
📚 Learning: 2025-05-28T18:33:30.155Z
Learnt from: anlowee
Repo: y-scope/clp PR: 925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.
Applied to files:
taskfile.yaml
📚 Learning: 2025-09-25T19:26:32.436Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1335
File: taskfiles/docker-images.yaml:15-15
Timestamp: 2025-09-25T19:26:32.436Z
Learning: In the CLP project's Taskfile, the `:package` task creates the G_PACKAGE_BUILD_DIR directory structure, so any task that depends on `:package` (like `docker-images:package`) can safely assume this directory exists without needing additional mkdir commands.
Applied to files:
taskfile.yaml
📚 Learning: 2025-07-25T21:29:48.947Z
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1126
File: .gitignore:5-5
Timestamp: 2025-07-25T21:29:48.947Z
Learning: In the CLP project, the .clang-format file is maintained in the yscope-dev-utils submodule and copied over to the main CLP repository, so it should be ignored in .gitignore to prevent accidental commits of the copied file and maintain the single source of truth in the submodule.
Applied to files:
components/package-template/src/etc/clp-config.yaml
📚 Learning: 2025-11-03T16:17:40.223Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Applied to files:
components/package-template/src/etc/clp-config.yaml
🧬 Code graph analysis (1)
tools/scripts/generate-config-schemas.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
ClpConfig(623-861)
🪛 Ruff (0.14.5)
tools/scripts/generate-config-schemas.py
26-26: Dynamically typed expressions (typing.Any) are disallowed in *args
(ANN401)
26-26: Dynamically typed expressions (typing.Any) are disallowed in **kwargs
(ANN401)
26-26: Dynamically typed expressions (typing.Any) are disallowed in generate
(ANN401)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: rust-checks
🔇 Additional comments (6)
tools/scripts/generate-config-schemas.py (3)
1-10: LGTM! Inline script metadata is well-structured.The uv inline script metadata correctly specifies dependencies and uses an editable install for the local
clp_py_utilspackage. This approach ensures the script can be executed directly with proper dependency management.
23-34: LGTM! Clean extension of Pydantic's schema generator.The custom generator correctly injects the
$schemafield by accessingself.schema_dialectfrom the parent class. The static analysis warnings aboutAnytypes are false positives—they're necessary when overriding parent methods with dynamic signatures.
37-47: Consider adding basic error handling.The script logic is correct but lacks error handling for potential failures during schema generation or file writing. While the script will naturally exit with an error if something goes wrong, explicit error handling with informative messages would improve debuggability.
Consider wrapping the schema generation and file writing in a try-except block:
output_dir = args.output_dir output_dir.mkdir(exist_ok=True) - schema = ClpConfig.model_json_schema(schema_generator=GenerateJsonSchemaWithSchemaKey) - with Path.open(output_dir / "clp-config.schema.json", "w") as f: - json.dump(schema, f, indent=2) + try: + schema = ClpConfig.model_json_schema(schema_generator=GenerateJsonSchemaWithSchemaKey) + with Path.open(output_dir / "clp-config.schema.json", "w") as f: + json.dump(schema, f, indent=2) + except Exception as e: + print(f"Error generating schema: {e}", file=sys.stderr) + sys.exit(1)⛔ Skipped due to learnings
Learnt from: haiqi96 Repo: y-scope/clp PR: 594 File: components/clp-package-utils/clp_package_utils/scripts/del_archives.py:56-65 Timestamp: 2024-11-18T16:49:20.248Z Learning: When reviewing wrapper scripts in `components/clp-package-utils/clp_package_utils/scripts/`, note that it's preferred to keep error handling simple without adding extra complexity.Learnt from: junhaoliao Repo: y-scope/clp PR: 1152 File: components/clp-package-utils/clp_package_utils/general.py:0-0 Timestamp: 2025-08-19T14:41:28.901Z Learning: In the CLP codebase, prefer explicit failures over automatic directory creation in utility functions like dump_config. The user junhaoliao prefers to let file operations fail when parent directories don't exist, as this helps catch implementation errors during development rather than masking setup issues with automatic directory creation.Learnt from: quinntaylormitchell Repo: y-scope/clp PR: 1170 File: tools/deployment/presto-clp/scripts/generate-metadata-filter-file.py:0-0 Timestamp: 2025-09-04T12:26:54.788Z Learning: The `set-up-config.sh` script in tools/deployment/presto-clp/scripts/ ensures that the output directory for metadata-filter.json already exists before calling generate-metadata-filter-file.py, so directory creation in the Python script is not strictly necessary.components/package-template/src/etc/clp-config.yaml (1)
1-1: LGTM! Schema directive is correctly configured.The yaml-language-server directive properly references the generated schema file using the correct relative path. This will enable IDE/LSP features like autocompletion and validation for developers editing the configuration file.
taskfile.yaml (2)
25-25: LGTM! Build directory variable follows project conventions.The
G_CONFIG_SCHEMA_BUILD_DIRvariable is properly defined and follows the project's naming conventions for build directory paths.
113-164: LGTM! Package task correctly integrates schema generation.The modifications properly integrate the config schema into the packaging workflow:
- The generated schema file is listed as a source dependency
- The
generate-config-schemastask is a dependency, ensuring schemas are generated before packaging- The rsync command correctly copies the schema to the package output directory
The trailing slash on the rsync source path correctly copies the directory contents rather than the directory itself.
| generate-config-schemas: | ||
| vars: | ||
| OUTPUT_DIR: "{{.G_CONFIG_SCHEMA_BUILD_DIR}}" | ||
| cmds: | ||
| - |- | ||
| rm -rf "{{.OUTPUT_DIR}}" | ||
| mkdir "{{.OUTPUT_DIR}}" | ||
| ./tools/scripts/generate-config-schemas.py "{{.OUTPUT_DIR}}" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider adding sources/generates for task caching.
The task implementation is correct, but it currently runs unconditionally. Adding sources and generates fields would allow Task to skip schema generation when inputs haven't changed, improving build performance.
Consider adding sources and generates:
generate-config-schemas:
vars:
OUTPUT_DIR: "{{.G_CONFIG_SCHEMA_BUILD_DIR}}"
+ sources:
+ - "{{.TASKFILE}}"
+ - "components/clp-py-utils/clp_py_utils/clp_config.py"
+ - "tools/scripts/generate-config-schemas.py"
+ generates:
+ - "{{.OUTPUT_DIR}}/clp-config.schema.json"
cmds:
- |-
rm -rf "{{.OUTPUT_DIR}}"
mkdir "{{.OUTPUT_DIR}}"
./tools/scripts/generate-config-schemas.py "{{.OUTPUT_DIR}}"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| generate-config-schemas: | |
| vars: | |
| OUTPUT_DIR: "{{.G_CONFIG_SCHEMA_BUILD_DIR}}" | |
| cmds: | |
| - |- | |
| rm -rf "{{.OUTPUT_DIR}}" | |
| mkdir "{{.OUTPUT_DIR}}" | |
| ./tools/scripts/generate-config-schemas.py "{{.OUTPUT_DIR}}" | |
| generate-config-schemas: | |
| vars: | |
| OUTPUT_DIR: "{{.G_CONFIG_SCHEMA_BUILD_DIR}}" | |
| sources: | |
| - "{{.TASKFILE}}" | |
| - "components/clp-py-utils/clp_py_utils/clp_config.py" | |
| - "tools/scripts/generate-config-schemas.py" | |
| generates: | |
| - "{{.OUTPUT_DIR}}/clp-config.schema.json" | |
| cmds: | |
| - |- | |
| rm -rf "{{.OUTPUT_DIR}}" | |
| mkdir "{{.OUTPUT_DIR}}" | |
| ./tools/scripts/generate-config-schemas.py "{{.OUTPUT_DIR}}" |
🤖 Prompt for AI Agents
In taskfile.yaml around lines 231 to 238, the generate-config-schemas task
currently always runs; add Task caching by declaring a sources list (all inputs
that affect schema generation, e.g., the generator script path
tools/scripts/generate-config-schemas.py, any config schema source directories
or templates, and repo files that influence output) and a generates list
pointing to the produced artifacts (the OUTPUT_DIR and a representative file
pattern under it, e.g., {{.G_CONFIG_SCHEMA_BUILD_DIR}}/** or a specific
generated schema file) so the task runtime can skip execution when sources are
unchanged and the generates are present.
| cmds: | ||
| - |- | ||
| rm -rf "{{.OUTPUT_DIR}}" | ||
| mkdir "{{.OUTPUT_DIR}}" |
There was a problem hiding this comment.
we should use mkdir -p to create any missing parent paths because the build directory may not be available when the task is run
otherwise, mkdir may complain that the path does not exist
@coderabbitai create an issue to track
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the mkdir -p usage in generate-config-schemas task
gh issue create \
--title "Use 'mkdir -p' in generate-config-schemas task to create parent directories" \
--body "## Description
The \`generate-config-schemas\` task in \`taskfile.yaml\` currently uses \`mkdir\` without the \`-p\` flag, which may fail if the parent \`build\` directory does not exist when the task is run.
Other tasks in the taskfile consistently use \`mkdir -p\` to ensure parent directories are created (e.g., \`init\`, \`package\`, \`webui\`, \`python-libs\`, \`download-and-extract-tar\`).
## Context
Raised by @junhaoliao in PR #1654:
https://github.com/y-scope/clp/pull/1654#discussion_r2562824787
## Location
File: \`taskfile.yaml\`
Task: \`generate-config-schemas\`" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1669 to track this improvement: #1669
You're absolutely right that mkdir -p should be used to ensure parent directories are created if they don't exist. This aligns with the pattern used consistently throughout the rest of the taskfile in tasks like init, package, webui, python-libs, and download-and-extract-tar.
🧠 Learnings used
Learnt from: anlowee
Repo: y-scope/clp PR: 925
File: taskfiles/deps/main.yaml:97-106
Timestamp: 2025-05-28T18:33:30.155Z
Learning: In the taskfiles dependency system (taskfiles/deps/main.yaml), echo commands are used to generate .cmake settings files that are consumed by the main CMake build process. These files set variables like ANTLR_RUNTIME_HEADER to point to dependency locations for use during compilation.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1335
File: taskfiles/docker-images.yaml:15-15
Timestamp: 2025-09-25T19:26:32.436Z
Learning: In the CLP project's Taskfile, the `:package` task creates the G_PACKAGE_BUILD_DIR directory structure, so any task that depends on `:package` (like `docker-images:package`) can safely assume this directory exists without needing additional mkdir commands.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1152
File: components/clp-package-utils/clp_package_utils/general.py:0-0
Timestamp: 2025-08-19T14:41:28.901Z
Learning: In the CLP codebase, prefer explicit failures over automatic directory creation in utility functions like dump_config. The user junhaoliao prefers to let file operations fail when parent directories don't exist, as this helps catch implementation errors during development rather than masking setup issues with automatic directory creation.
Learnt from: hoophalab
Repo: y-scope/clp PR: 1293
File: taskfiles/codegen.yaml:26-33
Timestamp: 2025-09-10T22:13:51.907Z
Learning: ANTLR automatically creates the output directory specified by the -o flag, so explicit directory creation with mkdir -p is not needed before running ANTLR commands.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).
Description
This change introduces automated generation and packaging of the CLP configuration
JSON schema.
Key updates:
generate-config-schemas.pywhich:pydanticandClpConfigto generate JSON schema.$schemafield via a custom schema generator.yaml-language-server: $schemadirective toclp-config.yamlfor editor / LS support.generate-config-schemasTaskfile task that:tools/scripts/generate-config-schemas.pyto produceclp-config.schema.json.Checklist
breaking change.
Validation performed
build/clp-package/usr/share/config-schemas/.packageandlogs_inputproperties.Summary by CodeRabbit
Release Notes
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.