docs(helm): Add Kubernetes Helm user guides and dev guides; Restructure user documentation.#1827
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (6)
WalkthroughRestructures CLP documentation to support two orchestration paths—Docker Compose for single-host use and Kubernetes via Helm for cluster deployments. Adds a full Kubernetes guide, reworks quick-starts into per-flavour tabs, reorganizes the docs index, updates cross-references, and removes several legacy overview pages. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…update orchestration links
…pose and Kubernetes orchestration details.
# Conflicts: # docs/src/user-docs/index.md
There was a problem hiding this comment.
I think that splitting the guides section into Deployment, Input & storage, etc. could be good, but I don't think that we then should merge the content of guides-overview.md with the content of this file, and then delete guides-overview.md. (That would mean making a link card for each of the sections.) It doesn't make sense to have a Guides section whose sole purpose is to direct the reader to other sections.
There was a problem hiding this comment.
See my comment on docs/src/user-docs/index.md re. what should be done with this file.
There was a problem hiding this comment.
the guides-overview.md was removed as a result
There was a problem hiding this comment.
How would you feel about standardizing the way we present names of programs/systems that don't start with a capital letter? I feel like it's hard when reading to see a program name like kind as plaintext and remember that it's a program name and not a word. Even for un-capitalized program names that aren't real words (e.g., kubeadm) it would be nice to monospace it. Programs/systems with capitalized names like Docker and Kubernetes are fine as plaintext imo.
There was a problem hiding this comment.
Good catch. let me try enforcing these as you proposed:
- Use backticks for lowercase tool names:
kind,kubeadm,kubectl,helm(when we specifically refer to the CLI) - Keep capitalized names as plaintext: Docker, Kubernetes, Helm (when referring to the project/company rather than the CLI command)
…ing guides, core, and reference sections into a unified overview.
…eployment, core usage, resources, and reference sections.
# Conflicts: # docs/src/user-docs/reference-overview.md
…ompose, Kubernetes, and multi-host setups.
There was a problem hiding this comment.
Actionable comments posted: 14
🤖 Fix all issues with AI agents
In @docs/src/dev-docs/design-deployment-orchestration.md:
- Around line 245-258: The "Spider" entries in the "Deployment types" section
are ambiguous; update the table or adjacent note under the "Deployment types"
header to explicitly state that Spider deployments (Spider Base and Spider Full)
are supported only for Docker Compose and are not supported for Helm; modify the
note that currently reads "Spider support is not yet available for Helm." to
something like "Spider deployments are supported for Docker Compose only; Helm
support for Spider is not yet available," and ensure the table rows for "Spider
Base" and "Spider Full" remain but align with this clarified note.
- Around line 265-271: Update the "Troubleshooting" section to add one or two
succinct sentences before the links that describe typical issues and when to use
each guide—mention that [docker-compose-debugging] is for local/service startup,
container logs, networking, and dependency issues in single-host setups, and
that [kubernetes-debugging] is for cluster-level problems like pod crashes,
scheduling, ingress, and Helm releases; reference the "Troubleshooting" heading
and the link tokens [docker-compose-debugging] and [kubernetes-debugging] so the
added context sits directly above those links.
- Around line 192-218: Clarify that today Kubernetes deployments are configured
directly via the Helm chart using values.yaml and Kubernetes Secrets/ConfigMaps
(users should place container-adapted settings and credentials into values.yaml
or create Secrets rather than relying on etc/clp-config.yaml), and update the
note to state that in the future a planned KubernetesController (like
DockerComposeController) will read clp-config.yaml and credentials.yaml and
automatically generate the Helm values/ConfigMaps/Secrets to unify the workflow;
reference KubernetesController, DockerComposeController, clp-config.yaml,
credentials.yaml, and values.yaml so readers know what to use now versus what
will be automated later.
In @docs/src/user-docs/guides-k8s-deployment.md:
- Around line 163-210: The docs currently show weak example credentials
(CLP_DB_PASS, CLP_DB_ROOT_PASS, CLP_QUEUE_PASS, CLP_REDIS_PASS) with only a
small “(change these for production)” note; add a prominent warning block
immediately before the credentials export block that calls out using strong,
unique passwords for production and suggests secure generation/storage options
(e.g., password manager or openssl rand -hex), and update the inline comment
near the credential exports to reinforce the warning so readers won’t copy the
example values.
- Around line 224-276: The YAML example has duplicate clpConfig.archive_output
keys which will cause the second to overwrite the first; merge the settings into
a single clpConfig.archive_output block containing target_archive_size,
compression_level, and retention_period (and remove the extra archive_output
section) so the example uses one archive_output under clpConfig with all three
fields present.
In @docs/src/user-docs/index.md:
- Around line 70-75: Update the grid-item-card title text "External database
setup" to a grammatically precise alternative such as "Setting up external
databases" (or "External database configuration") by editing the card content
where the string appears so it matches the style of other card titles and
pluralization.
In @docs/src/user-docs/quick-start/clp-json.md:
- Around line 78-80: Clarify that the hostPath/containerPath mount under the
"Mount for logs input" section is optional only when users supply logs via
object storage (e.g., S3), and update the text to either link to the object
storage guide or add a short note describing when to omit the mount (e.g., when
using S3/GCS inputs, no hostPath is required) and when to keep it (local file
ingestion or local testing); reference the "Mount for logs input" heading and
the hostPath/containerPath YAML keys so the update is applied adjacent to those
lines.
- Around line 117-120: Update the note about allowHostAccessForSbinScripts to
explain its security implications: change wording to advise keeping
allowHostAccessForSbinScripts set to true (the default) to enable sbin scripts,
and add a brief caution that this allows containers to access host
paths/resources and should be reviewed or restricted for production deployments.
Reference the configuration key allowHostAccessForSbinScripts and ensure the
note mentions both the behavior (allows host path access) and the recommended
review for production.
- Around line 54-58: Add a security note above the credential generation block
that warns these export commands (export CLP_DB_PASS, export CLP_DB_ROOT_PASS,
export CLP_QUEUE_PASS, export CLP_REDIS_PASS) will be stored in shell history
and provide brief mitigation steps: advise prefixing commands with a space if
HISTCONTROL=ignorespace, or temporarily disabling history with set +o history
(and re-enabling with set -o history); place the note immediately before the
existing "export CLP_DB_PASS=..." lines so readers see the warning before
generating credentials.
- Around line 146-153: Move the `etc/clp-config.yaml` configuration note so
users see it before any compression commands: place the existing "Configure
etc/clp-config.yaml to connect to the kind-deployed database" block immediately
above the "Compressing JSON logs" compress command (or add a clear workflow
callout labeled e.g. "Before running compression commands, configure
etc/clp-config.yaml" at the start of the "Compressing JSON logs" section), and
ensure the note appears in the Kubernetes tab as well so users following that
flow must configure the file before running compression.
- Around line 353-361: Replace the hardcoded port 30301 in the example curl
command with a reference to the CLP_API_SERVER_PORT variable or add a short note
clarifying the default; specifically update the curl URL
"http://localhost:30301/query/submit" to use
"http://localhost:${CLP_API_SERVER_PORT}/query/submit" (or add a one-line note
stating “The default API server port is 30301; if you changed
CLP_API_SERVER_PORT use that value”) so examples remain consistent with the
CLP_API_SERVER_PORT setting.
- Around line 378-385: The documentation repeats the same database port snippet
for etc/clp-config.yaml (the block showing "database: port: 30306" and
"results_cache: port: 30017") in multiple places; consolidate by keeping the
configuration only at the first mention and replace subsequent duplicates with a
short pointer or note, or add a tip immediately after the first "database: port:
30306" example stating that this setting applies globally (e.g., "This
configuration applies to all subsequent operations; you only need to set it
once") and update the later sections to reference that tip instead of repeating
the YAML.
- Around line 443-449: Update the two inconsistent reference links so they
include the .md extension: change the link target labeled [datasets] from
"../resources-datasets" to "../resources-datasets.md" and change the link target
labeled [object-storage] from "../guides-using-object-storage/index" to
"../guides-using-object-storage/index.md" to match the other reference link
formats in this block.
In @docs/src/user-docs/quick-start/clp-text.md:
- Around line 287-304: The API server curl example hard-codes
http://localhost:30301 which varies by deployment; update the docs around the
"Searching via the API server" example to provide deployment-specific guidance:
add separate notes or tabs for Docker Compose (use http://localhost:3001 and
mention CLP_API_SERVER_PORT env var) and Kubernetes (explain to use the NodePort
from clpConfig.api_server.port in Helm), or at minimum add a prominent sentence
telling users to adjust the host:port based on their deployment configuration
(reference the CLP_API_SERVER_PORT and clpConfig.api_server.port symbols).
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (12)
docs/requirements.txtdocs/src/dev-docs/design-deployment-orchestration.mddocs/src/user-docs/core-overview.mddocs/src/user-docs/guides-docker-compose-deployment.mddocs/src/user-docs/guides-external-database.mddocs/src/user-docs/guides-k8s-deployment.mddocs/src/user-docs/guides-overview.mddocs/src/user-docs/index.mddocs/src/user-docs/quick-start/clp-json.mddocs/src/user-docs/quick-start/clp-text.mddocs/src/user-docs/quick-start/index.mddocs/src/user-docs/reference-overview.md
💤 Files with no reviewable changes (3)
- docs/src/user-docs/core-overview.md
- docs/src/user-docs/reference-overview.md
- docs/src/user-docs/guides-overview.md
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-12-19T05:03:42.629Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1818
File: tools/deployment/package-helm/templates/configmap.yaml:12-12
Timestamp: 2025-12-19T05:03:42.629Z
Learning: In the y-scope/clp Helm chart, the `api_server.host` configuration field in the ConfigMap (tools/deployment/package-helm/templates/configmap.yaml) is unused because no other k8s internal services need to reach the API server—it's only accessed from outside the cluster via NodePort.
Applied to files:
docs/src/user-docs/guides-k8s-deployment.md
📚 Learning: 2025-06-18T20:39:05.899Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.
Applied to files:
docs/src/user-docs/index.mddocs/src/user-docs/quick-start/clp-json.mddocs/src/user-docs/quick-start/clp-text.mddocs/src/user-docs/quick-start/index.mddocs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-06-18T20:48:48.990Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.
Applied to files:
docs/src/user-docs/index.mddocs/src/user-docs/quick-start/index.md
📚 Learning: 2025-09-25T05:13:13.298Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Applied to files:
docs/src/user-docs/quick-start/clp-json.mddocs/src/user-docs/quick-start/clp-text.mddocs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
docs/src/user-docs/quick-start/clp-json.md
📚 Learning: 2025-09-15T22:20:40.750Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.
Applied to files:
docs/src/user-docs/quick-start/clp-json.md
📚 Learning: 2025-12-04T03:31:55.239Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.
Applied to files:
docs/src/user-docs/quick-start/clp-text.md
📚 Learning: 2025-08-15T21:48:40.228Z
Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.
Applied to files:
docs/src/user-docs/quick-start/index.md
📚 Learning: 2025-10-22T21:14:12.225Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.
Applied to files:
docs/src/user-docs/quick-start/index.md
📚 Learning: 2025-10-27T07:07:37.901Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
📚 Learning: 2025-08-25T16:27:50.549Z
Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.
Applied to files:
docs/src/dev-docs/design-deployment-orchestration.md
🪛 LanguageTool
docs/src/user-docs/quick-start/index.md
[grammar] ~13-~13: The verb form ‘guide’ does not appear to fit in this context.
Context: ...--- ## System Requirements This quick start guide covers single-host deployment using...
(SINGULAR_NOUN_VERB_AGREEMENT)
🪛 markdownlint-cli2 (0.18.1)
docs/src/user-docs/quick-start/clp-text.md
173-173: Link and image reference definitions should be needed
Unused link or image reference definition: "docker-non-root"
(MD053, link-image-reference-definitions)
🔇 Additional comments (18)
docs/requirements.txt (1)
6-6: Remove the upper bound constraint or clarify its purpose.The version constraint
<2.0.0added to sphinxcontrib-mermaid does not address any actual breaking changes in v2.0.0, as this version does not exist on PyPI. The latest releases are in the 0.x or 1.x series.If this constraint was added as a precautionary measure for a hypothetical future release, consider documenting that intent inline. Otherwise, if the constraint is unnecessary, remove it to allow the dependency resolver more flexibility.
Verify the actual latest stable version of sphinxcontrib-mermaid and confirm whether this upper bound serves a genuine purpose for your project's compatibility.
Likely an incorrect or invalid review comment.
docs/src/user-docs/guides-docker-compose-deployment.md (2)
1-26: LGTM! Clear documentation restructuring.The rename from "Multi-host deployment" to "Docker Compose deployment" better reflects the content, and the new "Deployment options" subsection provides helpful context about single-host vs. multi-host setups. The updated note about manual orchestration is also clearer.
324-353: LGTM! Helpful debugging additions.The new "Monitoring and debugging" section provides practical commands for troubleshooting Docker Compose deployments. The instance ID instructions and example commands are clear and useful.
docs/src/user-docs/guides-k8s-deployment.md (3)
45-104: LGTM! Comprehensive kubeadm setup instructions.The kubeadm cluster setup guide is well-structured and includes:
- Proper CIDR configuration with helpful notes
- CNI plugin installation (Cilium)
- Worker node joining instructions
- Token regeneration guidance
The technical details are accurate and the warnings about CIDR conflicts are helpful.
377-409: LGTM! Comprehensive configuration reference.The configuration options table provides a clear reference for commonly used Helm values with appropriate defaults and descriptions. This is very helpful for users.
19-19: No action required — Helm version requirement is correct.The requirement specifying
Helm >= 4.0is accurate. Helm 4 is the current stable release line as of January 2026, following CNCF/Helm's announcement on November 12, 2025. The documentation aligns with the latest stable version.Likely an incorrect or invalid review comment.
docs/src/user-docs/quick-start/clp-text.md (2)
54-148: LGTM! Clear Kubernetes setup instructions.The Kubernetes (kind) setup tab provides comprehensive instructions including:
- Environment variable setup
- Directory creation
- kind cluster configuration with proper mounts
- Helm chart installation with clp-text configuration
The structure mirrors the Docker Compose approach well, making it easy for users to choose their preferred orchestration method.
383-386: Remove unused link reference.The link reference
[docker-non-root]at line 386 is defined but never used in the document.🧹 Proposed fix
Remove the unused reference:
[api-server]: ../guides-using-the-api-server.md [datasets]: ../resources-datasets [docker-compose-deployment]: ../guides-docker-compose-deployment.md -[docker-non-root]: ... [k8s-deployment]: ../guides-k8s-deployment.md [text-search-syntax]: ../reference-text-search-syntax.mdNote: The actual line for
docker-non-rootisn't shown in the provided code, but based on the static analysis hint, it should be removed if unused.Likely an incorrect or invalid review comment.
docs/src/user-docs/quick-start/index.md (1)
13-72: LGTM! Clear requirements organization.The tab-set approach for system requirements effectively separates Docker Compose and Kubernetes (kind) requirements. The version checks and installation notes are helpful.
Note: The Helm version requirement (>= 4.0) should be verified as mentioned in the k8s-deployment guide review.
docs/src/user-docs/guides-external-database.md (1)
208-208: Fix incorrect relative path.The relative path
../user-docs/guides-docker-compose-deployment.mdis incorrect. Since this file is already indocs/src/user-docs/, the../user-docs/prefix will resolve todocs/src/user-docs/user-docs/, which doesn't exist.🔗 Proposed fix
-[docker-compose-orchestration]: ../user-docs/guides-docker-compose-deployment.md +[docker-compose-orchestration]: guides-docker-compose-deployment.md⛔ Skipped due to learnings
Learnt from: Bill-hbrhbr Repo: y-scope/clp PR: 1261 File: docs/src/dev-guide/components-core/manylinux-2-28-deps-install.md:24-24 Timestamp: 2025-08-25T06:32:48.313Z Learning: In the CLP project documentation, when linking to scripts or other files within the repository, use relative paths (e.g., components/core/tools/scripts/...) rather than commit-pinned GitHub URLs to ensure docs and referenced files always belong to the same commit and stay synchronized.docs/src/user-docs/index.md (4)
6-20: LGTM: Clear quick-start section.The Quick start section provides a clear entry point for new users with appropriate context and navigation.
22-45: LGTM: Well-structured deployment section.The Deployment section clearly distinguishes between Docker Compose and Kubernetes orchestration methods, with helpful descriptions for each deployment approach.
224-285: All toctree file references are valid and complete.Verification confirms that all 21 toctree entries across the 7 sections reference existing files. No missing files or broken references were identified.
188-222: Fix backtick consistency in the last grid-item-card entry.The link targets are correct and descriptions accurately reflect the file content. However, the last grid-item-card description uses "clp's" without backticks, while all other tool name references are consistently wrapped in backticks. Update "clp's schema file" to "
clp's schema file" for consistency.docs/src/dev-docs/design-deployment-orchestration.md (3)
3-6: LGTM: Clear scope expansion for dual orchestration.The introduction effectively establishes that CLP now supports both Docker Compose and Kubernetes (via Helm) orchestration, setting appropriate context for the document.
277-284: All documentation links are valid.All referenced user documentation files exist and contain the expected anchors:
guides-docker-compose-deployment.md#monitoring-and-debugging✓guides-k8s-deployment.md#monitoring-and-debugging✓guides-docker-compose-deployment.md#multi-host-deployment✓
236-243: Clarify that local-storage with ReadWriteMany access mode has significant limitations requiring alternative StorageClasses for production.The current documentation claims that Kubernetes uses
local-storageStorageClass for shared PVCs withReadWriteManyaccess mode. However, the Helm chart's actual implementation creates local PersistentVolumes (using hostPath binding) which are node-local and inherently incompatible withReadWriteManysemantics. This configuration only functions correctly when all services accessing shared archives and streams are scheduled on the same node (control-plane in the current setup).For production deployments spanning multiple nodes, users must configure network-based StorageClasses (e.g., NFS, Ceph) that support
ReadWriteMany. The documentation should explicitly:
- Note that the default local-storage configuration is suitable only for single-node or tightly-coupled deployments
- Explain the incompatibility between local storage and ReadWriteMany access mode
- Provide guidance on configuring alternative StorageClasses for production multi-node environments
docs/src/user-docs/quick-start/clp-json.md (1)
61-66: Directory structure is complete.All directories referenced in the Helm chart's volume mounts are already included in the mkdir command. Cross-referencing the Helm templates confirms that all required subdirectories under
data_directory,logs_directory, andtmp_directoryare properly created by the command.
| ## Orchestration methods | ||
|
|
||
| The orchestration code is split up into: | ||
| CLP supports two orchestration methods: Docker Compose for single-host or manual multi-host | ||
| deployments, and Helm for Kubernetes deployments. Both methods share the same configuration | ||
| interface (`clp-config.yaml` and `credentials.yaml`) and support the same deployment types. | ||
|
|
||
| * `BaseController` that defines: | ||
| * common logic for preparing the environment variables, configuration files, and directories | ||
| necessary for each service. | ||
| * abstract methods that orchestrator-specific derived classes must implement in order to | ||
| orchestrate a deployment. | ||
| * `<Orchestrator>Controller` that implements (and/or overrides) any of the methods in | ||
| `BaseController` (`<Orchestrator>` is a placeholder for the specific orchestrator for which the | ||
| class is being implemented). | ||
| ### Configuration | ||
|
|
||
| ## Docker Compose orchestration | ||
| Each service requires configuration values passed through config files, environment variables, | ||
| and/or command line arguments. Since services run in containers, some values must be adapted for the | ||
| orchestration environment—specifically, host paths must be converted to container paths, and | ||
| hostnames/ports must use service discovery mechanisms. | ||
|
|
||
| This section explains how we use Docker Compose to orchestrate the CLP package and is broken into | ||
| the following subsections: | ||
| The orchestration controller (e.g., `DockerComposeController`) reads `etc/clp-config.yaml` and | ||
| `etc/credentials.yaml`, then generates: | ||
| * A container-specific CLP config file with adapted paths and service names | ||
| * Runtime configuration (environment variables or ConfigMaps) | ||
| * Required directories (e.g., data output directories) | ||
|
|
||
| * [Setting up the Docker Compose project's environment](#setting-up-the-environment) | ||
| * [Starting and stoping the Docker Compose project](#starting-and-stopping-the-project) | ||
| * [Deployment types](#deployment-types) | ||
| * [Implementation details](#implementation-details) | ||
| * [Troubleshooting](#troubleshooting) | ||
| For Docker Compose, this generates `var/log/.clp-config.yaml` and `.env`. For Kubernetes, the Helm | ||
| chart generates a ConfigMap and Secrets from `values.yaml`. | ||
|
|
||
| ### Setting up the environment | ||
|
|
||
| Several services require configuration values to be passed in through the CLP package's config file, | ||
| environment variables, and/or command line arguments. Since the services are running in containers, | ||
| some of these configuration values need to be modified for the orchestration environment. | ||
| Specifically: | ||
| :::{note} | ||
| A `KubernetesController` is also planned that will read `clp-config.yaml` and `credentials.yaml` | ||
| like `DockerComposeController`, then set up the Helm release accordingly. This will unify the | ||
| configuration experience across both orchestration methods. | ||
| ::: |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Clarify the configuration flow for Kubernetes.
The note about a planned KubernetesController (lines 215-218) may confuse readers. Since the Helm chart currently uses values.yaml, explain how users should configure Kubernetes deployments today versus the future unified approach.
📝 Suggested clarification
For Docker Compose, this generates `var/log/.clp-config.yaml` and `.env`. For Kubernetes, the Helm
chart generates a ConfigMap and Secrets from `values.yaml`.
:::{note}
+**Current approach**: For Kubernetes deployments, configure the Helm chart via `values.yaml`
+or `--set` flags as shown in the [Kubernetes deployment guide][kubernetes-guide].
+
+**Future enhancement**:
A `KubernetesController` is also planned that will read `clp-config.yaml` and `credentials.yaml`
like `DockerComposeController`, then set up the Helm release accordingly. This will unify the
configuration experience across both orchestration methods.
:::📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Orchestration methods | |
| The orchestration code is split up into: | |
| CLP supports two orchestration methods: Docker Compose for single-host or manual multi-host | |
| deployments, and Helm for Kubernetes deployments. Both methods share the same configuration | |
| interface (`clp-config.yaml` and `credentials.yaml`) and support the same deployment types. | |
| * `BaseController` that defines: | |
| * common logic for preparing the environment variables, configuration files, and directories | |
| necessary for each service. | |
| * abstract methods that orchestrator-specific derived classes must implement in order to | |
| orchestrate a deployment. | |
| * `<Orchestrator>Controller` that implements (and/or overrides) any of the methods in | |
| `BaseController` (`<Orchestrator>` is a placeholder for the specific orchestrator for which the | |
| class is being implemented). | |
| ### Configuration | |
| ## Docker Compose orchestration | |
| Each service requires configuration values passed through config files, environment variables, | |
| and/or command line arguments. Since services run in containers, some values must be adapted for the | |
| orchestration environment—specifically, host paths must be converted to container paths, and | |
| hostnames/ports must use service discovery mechanisms. | |
| This section explains how we use Docker Compose to orchestrate the CLP package and is broken into | |
| the following subsections: | |
| The orchestration controller (e.g., `DockerComposeController`) reads `etc/clp-config.yaml` and | |
| `etc/credentials.yaml`, then generates: | |
| * A container-specific CLP config file with adapted paths and service names | |
| * Runtime configuration (environment variables or ConfigMaps) | |
| * Required directories (e.g., data output directories) | |
| * [Setting up the Docker Compose project's environment](#setting-up-the-environment) | |
| * [Starting and stoping the Docker Compose project](#starting-and-stopping-the-project) | |
| * [Deployment types](#deployment-types) | |
| * [Implementation details](#implementation-details) | |
| * [Troubleshooting](#troubleshooting) | |
| For Docker Compose, this generates `var/log/.clp-config.yaml` and `.env`. For Kubernetes, the Helm | |
| chart generates a ConfigMap and Secrets from `values.yaml`. | |
| ### Setting up the environment | |
| Several services require configuration values to be passed in through the CLP package's config file, | |
| environment variables, and/or command line arguments. Since the services are running in containers, | |
| some of these configuration values need to be modified for the orchestration environment. | |
| Specifically: | |
| :::{note} | |
| A `KubernetesController` is also planned that will read `clp-config.yaml` and `credentials.yaml` | |
| like `DockerComposeController`, then set up the Helm release accordingly. This will unify the | |
| configuration experience across both orchestration methods. | |
| ::: | |
| ## Orchestration methods | |
| CLP supports two orchestration methods: Docker Compose for single-host or manual multi-host | |
| deployments, and Helm for Kubernetes deployments. Both methods share the same configuration | |
| interface (`clp-config.yaml` and `credentials.yaml`) and support the same deployment types. | |
| ### Configuration | |
| Each service requires configuration values passed through config files, environment variables, | |
| and/or command line arguments. Since services run in containers, some values must be adapted for the | |
| orchestration environment—specifically, host paths must be converted to container paths, and | |
| hostnames/ports must use service discovery mechanisms. | |
| The orchestration controller (e.g., `DockerComposeController`) reads `etc/clp-config.yaml` and | |
| `etc/credentials.yaml`, then generates: | |
| * A container-specific CLP config file with adapted paths and service names | |
| * Runtime configuration (environment variables or ConfigMaps) | |
| * Required directories (e.g., data output directories) | |
| For Docker Compose, this generates `var/log/.clp-config.yaml` and `.env`. For Kubernetes, the Helm | |
| chart generates a ConfigMap and Secrets from `values.yaml`. | |
| :::{note} | |
| **Current approach**: For Kubernetes deployments, configure the Helm chart via `values.yaml` | |
| or `--set` flags as shown in the [Kubernetes deployment guide][kubernetes-guide]. | |
| **Future enhancement**: | |
| A `KubernetesController` is also planned that will read `clp-config.yaml` and `credentials.yaml` | |
| like `DockerComposeController`, then set up the Helm release accordingly. This will unify the | |
| configuration experience across both orchestration methods. | |
| ::: |
🤖 Prompt for AI Agents
In @docs/src/dev-docs/design-deployment-orchestration.md around lines 192 - 218,
Clarify that today Kubernetes deployments are configured directly via the Helm
chart using values.yaml and Kubernetes Secrets/ConfigMaps (users should place
container-adapted settings and credentials into values.yaml or create Secrets
rather than relying on etc/clp-config.yaml), and update the note to state that
in the future a planned KubernetesController (like DockerComposeController) will
read clp-config.yaml and credentials.yaml and automatically generate the Helm
values/ConfigMaps/Secrets to unify the workflow; reference KubernetesController,
DockerComposeController, clp-config.yaml, credentials.yaml, and values.yaml so
readers know what to use now versus what will be automated later.
| ### Deployment types | ||
|
|
||
| One notable implementation detail is in how we handle mounts that are only necessary under certain | ||
| configurations. For instance, the input logs mount is only necessary when the `logs_input.type` is | ||
| `fs`. If `logs_input.type` is `s3`, we shouldn't mount some random directory from the user's | ||
| host filesystem into the container. However, Docker doesn't provide a mechanism to perform | ||
| conditional mounts. Instead, we use Docker's variable interpolation to conditionally mount an empty | ||
| tmpfs mount into the container. This strategy is used wherever we need a conditional mount. | ||
| CLP supports multiple deployment configurations based on the compression scheduler and query engine. | ||
|
|
||
| ### Troubleshooting | ||
| | Deployment Type | Compression Scheduler | Query Engine | | ||
| |-----------------|-----------------------|------------------------------| | ||
| | Base | Celery | [Presto][presto-integration] | | ||
| | Full | Celery | Native | | ||
| | Spider Base | Spider | [Presto][presto-integration] | | ||
| | Spider Full | Spider | Native | | ||
|
|
||
| If you encounter issues with the Docker Compose deployment, first determine the instance ID for your | ||
| deployment by checking the content of `<clp-package>/var/log/instance-id`. Then run one of the | ||
| commands below as necessary. | ||
| :::{note} | ||
| Spider support is not yet available for Helm. | ||
| ::: |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Clarify Spider deployment status.
The deployment types table includes Spider Base and Spider Full, but the note states "Spider support is not yet available for Helm." This may confuse users about whether they can use Spider at all. Consider clarifying that Spider is available for Docker Compose only.
♻️ Suggested clarification
### Deployment types
CLP supports multiple deployment configurations based on the compression scheduler and query engine.
| Deployment Type | Compression Scheduler | Query Engine |
|-----------------|-----------------------|------------------------------|
| Base | Celery | [Presto][presto-integration] |
| Full | Celery | Native |
| Spider Base | Spider | [Presto][presto-integration] |
| Spider Full | Spider | Native |
:::{note}
-Spider support is not yet available for Helm.
+Spider-based deployment types (Spider Base and Spider Full) are currently only supported
+for Docker Compose deployments. Kubernetes/Helm support for Spider is planned for a future release.
:::🤖 Prompt for AI Agents
In @docs/src/dev-docs/design-deployment-orchestration.md around lines 245 - 258,
The "Spider" entries in the "Deployment types" section are ambiguous; update the
table or adjacent note under the "Deployment types" header to explicitly state
that Spider deployments (Spider Base and Spider Full) are supported only for
Docker Compose and are not supported for Helm; modify the note that currently
reads "Spider support is not yet available for Helm." to something like "Spider
deployments are supported for Docker Compose only; Helm support for Spider is
not yet available," and ensure the table rows for "Spider Base" and "Spider
Full" remain but align with this clarified note.
| ## Troubleshooting | ||
|
|
||
| 2. View service logs: | ||
| When issues arise, use the appropriate commands for your orchestration method: | ||
|
|
||
| ```bash | ||
| docker compose --project-name clp-package-<instance-id> logs <service-name> | ||
| ``` | ||
| * [Docker Compose debugging][docker-compose-debugging] | ||
| * [Kubernetes Helm debugging][kubernetes-debugging] | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add context for troubleshooting links.
The Troubleshooting section provides links to debugging guides but doesn't explain what kinds of issues users might encounter or when to consult each guide. Consider adding brief context.
📋 Suggested improvement
## Troubleshooting
-When issues arise, use the appropriate commands for your orchestration method:
+When services fail to start, crash, or exhibit unexpected behaviour, use the appropriate
+debugging commands and techniques for your orchestration method:
* [Docker Compose debugging][docker-compose-debugging]
* [Kubernetes Helm debugging][kubernetes-debugging]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Troubleshooting | |
| 2. View service logs: | |
| When issues arise, use the appropriate commands for your orchestration method: | |
| ```bash | |
| docker compose --project-name clp-package-<instance-id> logs <service-name> | |
| ``` | |
| * [Docker Compose debugging][docker-compose-debugging] | |
| * [Kubernetes Helm debugging][kubernetes-debugging] | |
| ## Troubleshooting | |
| When services fail to start, crash, or exhibit unexpected behaviour, use the appropriate | |
| debugging commands and techniques for your orchestration method: | |
| * [Docker Compose debugging][docker-compose-debugging] | |
| * [Kubernetes Helm debugging][kubernetes-debugging] | |
🤖 Prompt for AI Agents
In @docs/src/dev-docs/design-deployment-orchestration.md around lines 265 - 271,
Update the "Troubleshooting" section to add one or two succinct sentences before
the links that describe typical issues and when to use each guide—mention that
[docker-compose-debugging] is for local/service startup, container logs,
networking, and dependency issues in single-host setups, and that
[kubernetes-debugging] is for cluster-level problems like pod crashes,
scheduling, ingress, and Helm releases; reference the "Troubleshooting" heading
and the link tokens [docker-compose-debugging] and [kubernetes-debugging] so the
added context sits directly above those links.
| ### Basic installation | ||
|
|
||
| Create the required directories on all worker nodes: | ||
|
|
||
| ```bash | ||
| export CLP_HOME="/tmp/clp" | ||
|
|
||
| mkdir -p "$CLP_HOME/var/data/"{archives,streams,staged-archives,staged-streams} \ | ||
| "$CLP_HOME/var/log/"{compression_scheduler,compression_worker,user} \ | ||
| "$CLP_HOME/var/log/"{query_scheduler,query_worker,reducer} \ | ||
| "$CLP_HOME/var/tmp" | ||
| ``` | ||
|
|
||
| Then on the **control-plane node**, generate credentials and install CLP: | ||
|
|
||
| ```bash | ||
| export CLP_HOME="/tmp/clp" | ||
|
|
||
| mkdir -p "$CLP_HOME/var/"{data,log}/{database,queue,redis,results_cache} \ | ||
| "$CLP_HOME/var/data/"{archives,streams,staged-archives,staged-streams} \ | ||
| "$CLP_HOME/var/log/"{compression_scheduler,compression_worker,user} \ | ||
| "$CLP_HOME/var/log/"{query_scheduler,query_worker,reducer} \ | ||
| "$CLP_HOME/var/log/"{garbage_collector,api_server,log_ingestor,mcp_server} \ | ||
| "$CLP_HOME/var/tmp" | ||
|
|
||
| # Credentials (change these for production) | ||
| export CLP_DB_PASS="pass" | ||
| export CLP_DB_ROOT_PASS="root-pass" | ||
| export CLP_QUEUE_PASS="pass" | ||
| export CLP_REDIS_PASS="pass" | ||
|
|
||
| # Worker replicas (increase for multi-node clusters) | ||
| export CLP_COMPRESSION_WORKER_REPLICAS=1 | ||
| export CLP_QUERY_WORKER_REPLICAS=1 | ||
|
|
||
| helm install clp . \ | ||
| --set clpConfig.data_directory="$CLP_HOME/var/data" \ | ||
| --set clpConfig.logs_directory="$CLP_HOME/var/log" \ | ||
| --set clpConfig.tmp_directory="$CLP_HOME/var/tmp" \ | ||
| --set clpConfig.archive_output.storage.directory="$CLP_HOME/var/data/archives" \ | ||
| --set clpConfig.stream_output.storage.directory="$CLP_HOME/var/data/streams" \ | ||
| --set credentials.database.password="$CLP_DB_PASS" \ | ||
| --set credentials.database.root_password="$CLP_DB_ROOT_PASS" \ | ||
| --set credentials.queue.password="$CLP_QUEUE_PASS" \ | ||
| --set credentials.redis.password="$CLP_REDIS_PASS" \ | ||
| --set compressionWorker.replicas="$CLP_COMPRESSION_WORKER_REPLICAS" \ | ||
| --set queryWorker.replicas="$CLP_QUERY_WORKER_REPLICAS" | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider emphasizing production credential requirements.
While the basic installation section is clear, lines 188-192 show example credentials (pass, root-pass) that users might copy directly. Although there's a comment "(change these for production)", this could be made more prominent.
📝 Suggested improvement
Consider adding a warning block before the credentials section:
:::{warning}
The example below uses simple passwords for demonstration. **Always use strong, unique passwords for production deployments.** Consider using a password manager or `openssl rand -hex 16` to generate secure credentials.
:::🤖 Prompt for AI Agents
In @docs/src/user-docs/guides-k8s-deployment.md around lines 163 - 210, The docs
currently show weak example credentials (CLP_DB_PASS, CLP_DB_ROOT_PASS,
CLP_QUEUE_PASS, CLP_REDIS_PASS) with only a small “(change these for
production)” note; add a prominent warning block immediately before the
credentials export block that calls out using strong, unique passwords for
production and suggests secure generation/storage options (e.g., password
manager or openssl rand -hex), and update the inline comment near the credential
exports to reinforce the warning so readers won’t copy the example values.
| curl -X POST "http://localhost:30301/query/submit" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "query_string": "<query>", | ||
| "max_num_results": 1000, | ||
| "timestamp_begin": null, | ||
| "timestamp_end": null, | ||
| "case_sensitive": false | ||
| }' |
There was a problem hiding this comment.
API server port hardcoded without variable reference.
The API server endpoint uses hardcoded port 30301, but earlier in the document (line 50), this port is set via CLP_API_SERVER_PORT environment variable. This creates potential inconsistency if users change the port. Consider referencing the variable or noting that the default port is shown.
🔧 Suggested fix
To search via the API server:
```bash
-curl -X POST "http://localhost:30301/query/submit" \
+# Replace 30301 with your CLP_API_SERVER_PORT if you changed it
+curl -X POST "http://localhost:${CLP_API_SERVER_PORT}/query/submit" \
-H "Content-Type: application/json" \
-d '{Or add a note:
To search via the API server:
+```{note}
+The default API server port is 30301. If you changed `CLP_API_SERVER_PORT`, use that value instead.
+```
+
```bash
curl -X POST "http://localhost:30301/query/submit" \🤖 Prompt for AI Agents
In @docs/src/user-docs/quick-start/clp-json.md around lines 353 - 361, Replace
the hardcoded port 30301 in the example curl command with a reference to the
CLP_API_SERVER_PORT variable or add a short note clarifying the default;
specifically update the curl URL "http://localhost:30301/query/submit" to use
"http://localhost:${CLP_API_SERVER_PORT}/query/submit" (or add a one-line note
stating “The default API server port is 30301; if you changed
CLP_API_SERVER_PORT use that value”) so examples remain consistent with the
CLP_API_SERVER_PORT setting.
| Configure `etc/clp-config.yaml` to connect to the `kind`-deployed services: | ||
|
|
||
| ```yaml | ||
| database: | ||
| port: 30306 | ||
| results_cache: | ||
| port: 30017 | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Duplicate database port configuration across sections.
The database port configuration is repeated in multiple sections (lines 149-151, 204-206, and 381-384). This creates maintenance burden and potential inconsistency. Consider consolidating these or adding a clear note that this is a one-time configuration.
Consider adding a tip after the first occurrence:
Configure `etc/clp-config.yaml` to connect to the `kind`-deployed database:
```yaml
database:
port: 30306+{tip} +This configuration applies to all subsequent operations (compression, search, etc.). +You only need to set it once. +
<details>
<summary>🤖 Prompt for AI Agents</summary>
In @docs/src/user-docs/quick-start/clp-json.md around lines 378 - 385, The
documentation repeats the same database port snippet for etc/clp-config.yaml
(the block showing "database: port: 30306" and "results_cache: port: 30017") in
multiple places; consolidate by keeping the configuration only at the first
mention and replace subsequent duplicates with a short pointer or note, or add a
tip immediately after the first "database: port: 30306" example stating that
this setting applies globally (e.g., "This configuration applies to all
subsequent operations; you only need to set it once") and update the later
sections to reference that tip instead of repeating the YAML.
</details>
<!-- fingerprinting:phantom:triton:puma -->
<!-- This is an auto-generated comment by CodeRabbit -->
kirkrodrigues
left a comment
There was a problem hiding this comment.
@junhaoliao Made some edits based on our offline discussions, etc. Also added a comment. Please review.
| The following table lists commonly used Helm values. For a complete list, see | ||
| [`values.yaml`][clp-helm-chart-values] in the chart directory. | ||
|
|
||
| | Parameter | Description | Default | |
There was a problem hiding this comment.
This table feels a bit redundant and slightly confusing to me rather than simply reading values.yaml directly, which seems to have more detailed comments anyway. Not to mention, this table can easily go out of sync. Can we remove this table?
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/src/user-docs/quick-start/clp-json.md`:
- Around line 335-345: MD031 linting error: add a single blank line after the
closing code-fence(s) so the following container markers render correctly;
locate the affected triple-backtick fences (```bash) and insert one blank line
between each closing ``` and the next ':::' or '::::' tag (applies to both
Docker Compose and Kubernetes examples) so there is a blank line separating the
code block from the subsequent ::: markers.
♻️ Duplicate comments (10)
docs/src/user-docs/guides-k8s-deployment.md (3)
172-194: Consider clarifying directory creation requirements.The instructions create directories on "all worker nodes" (lines 172-180), then separately on the "control-plane node" (lines 185-194). If the control-plane node also runs worker pods, it would need both sets of directories. Consider adding a note stating whether control-plane nodes require both directory sets, or consolidate the instructions with clear annotations about which paths are needed where.
517-533: Add host configuration to sbin script example.The configuration example shows only port mappings but omits the required
hostfields. Users connecting via NodePort would need to specify a cluster node's IP or hostname:📝 More complete example
database: host: "<node-ip-or-hostname>" # IP/hostname of any cluster node port: 30306 # Match `clpConfig.database.port` in Helm values results_cache: host: "<node-ip-or-hostname>" # IP/hostname of any cluster node port: 30017 # Match `clpConfig.results_cache.port` in Helm valuesAdditionally, users would need to configure credentials in
etc/credentials.yamlto match the Helm chart's credential values.
239-287: Fix duplicate YAML key in custom values example.The example has two
archive_output:keys underclpConfig(lines 260-262 and 270-271). In YAML, duplicate keys at the same level are invalid—the second occurrence will silently override the first. Merge them into a singlearchive_output:block containing all three fields:🔧 Proposed fix
clpConfig: # Use clp-text, instead of clp-json (default) package: storage_engine: "clp" query_engine: "clp" - # Configure archive output - archive_output: - target_archive_size: 536870912 # 512 MB - compression_level: 6 - # Enable MCP server mcp_server: port: 30800 logging_level: "INFO" - # Configure data retention (in minutes) + # Configure archive output and data retention archive_output: + target_archive_size: 536870912 # 512 MB + compression_level: 6 retention_period: 10080 # 7 days (in minutes)docs/src/dev-docs/design-deployment-orchestration.md (2)
260-262: Clarify Spider deployment availability.The note states Spider isn't available for Helm but doesn't explicitly confirm it works with Docker Compose. Consider rewording for clarity:
📝 Suggested improvement
:::{note} -Spider support is not yet available for Helm. +Spider-based deployment types (Spider Base and Spider Full) are currently supported +for Docker Compose only. Kubernetes/Helm support for Spider is planned for a future release. :::
269-274: Consider adding troubleshooting context.The section provides debugging links but doesn't indicate what types of issues users might encounter. Adding a brief sentence about common issues (service startup failures, container crashes, connectivity problems) would help users quickly determine if they're in the right section.
📝 Optional enhancement
## Troubleshooting -When issues arise, use the appropriate commands for your orchestration method: +When services fail to start, crash, or exhibit connectivity issues, use the appropriate +debugging commands for your orchestration method:docs/src/user-docs/quick-start/clp-json.md (3)
53-57: Security: Credentials echoed in shell history.The credentials generated using
openssl rand -hex 16will be stored in the shell history, potentially exposing sensitive information. Consider advising users to prefix these commands with a space (to avoid history in bash withHISTCONTROL=ignorespace) or to unset history temporarily.
78-80: Clarify the optional log input mount.The comment states "not needed if using S3 input," but users may be unclear about when they need this mount. Consider adding a reference to the object storage guide or expanding the explanation.
127-134: Fix MD031: Add blank line after fenced code block.The closing fence at line 134 should be followed by a blank line before the
:::tag to satisfy the MD031 linting rule.📝 Suggested fix
```yaml database: port: 30306 results_cache: port: 30017
:::
::::</details> </blockquote></details> <details> <summary>docs/src/user-docs/index.md (2)</summary><blockquote> `72-77`: **Consider addressing the previously suggested title improvement.** A past review suggested changing "External database setup" to a more grammatically precise alternative like "Setting up external databases" or "External database configuration" for consistency with other card titles. --- `103-108`: **Apply the previously confirmed title change.** The title "MCP server" should be changed to "Using the MCP server" for consistency with other cards in this section. This change was confirmed as addressed in a previous review but was not applied. <details> <summary>✏️ Suggested change</summary> ```diff :::{grid-item-card} :link: guides-mcp-server/index -MCP server +Using the MCP server ^^^ Integrate CLP with AI assistants using the Model Context Protocol. :::
junhaoliao
left a comment
There was a problem hiding this comment.
clp-text configuration issues: API server and log_ingestor are not supported in clp-text (see clp-config.template.text.yaml which sets api_server: null and log_ingestor: null). These references should be removed.
…t guide - Apply suggestions from code review
|
|
||
| --- | ||
|
|
||
| ## Monitoring and debugging |
There was a problem hiding this comment.
we should also add kubectl get events and kubectl describe pod <pod-name> here.
@coderabbitai create an issue to track
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create GitHub issue to track adding kubectl debugging commands
gh issue create \
--repo y-scope/clp \
--title "docs(helm): Add kubectl debugging commands to Monitoring section" \
--body "Add \`kubectl get events\` and \`kubectl describe pod <pod-name>\` commands to the Monitoring and debugging section of the Kubernetes deployment guide.
Related PR: https://github.com/y-scope/clp/pull/1827
Related comment: https://github.com/y-scope/clp/pull/1827#discussion_r2750441166
Requested by: `@junhaoliao`" \
--assignee junhaoliao
Length of output: 42
@junhaoliao, I've created issue #1942 to track adding the kubectl get events and kubectl describe pod <pod-name> commands to the Monitoring and debugging section.
🧠 Learnings used
Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.
Description
Add comprehensive Kubernetes/Helm deployment documentation alongside existing Docker Compose guides, enabling users to deploy CLP using either orchestration method.
Changes
New documentation
guides-k8s-deployment.md): Complete guide coveringkindlocaldevelopment,
kubeadmproduction clusters, Helm chart installation, worker scheduling, andtroubleshooting
clp-json.mdandclp-text.mdwith Docker Compose andKubernetes tabs for step-by-step deployment instructions
Updated documentation
orchestration architecture, configuration flows, and dependency management
guides-single-host.mdtoguides-docker-compose-deployment.mdfor clarity; added multi-host deployment details
Analytics, Core, Datasets, Reference)
Removed documentation
core-overview.md,guides-overview.md,reference-overview.md: Content consolidated into therestructured index
Checklist
breaking change.
Validation performed
values.yamland documented configuration optionsset-up-test.sh,set-up-multi-dedicated-test.sh,set-up-multi-shared-test.sh)guides-k8s-deployment.mdfor consistencySummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.