docs(helm): Add Kubernetes Helm user guides and dev guides; Restructure user documentation. by junhaoliao · Pull Request #1827 · y-scope/clp

junhaoliao · 2025-12-19T21:52:01Z

Description

Add comprehensive Kubernetes/Helm deployment documentation alongside existing Docker Compose guides, enabling users to deploy CLP using either orchestration method.

Changes

New documentation

Kubernetes deployment guide (guides-k8s-deployment.md): Complete guide covering kind local
development, kubeadm production clusters, Helm chart installation, worker scheduling, and
troubleshooting
Tabbed quick-start guides: Updated clp-json.md and clp-text.md with Docker Compose and
Kubernetes tabs for step-by-step deployment instructions

Updated documentation

Deployment orchestration design doc: Enhanced to explain both Docker Compose and Kubernetes
orchestration architecture, configuration flows, and dependency management
Docker Compose guide: Renamed from guides-single-host.md to guides-docker-compose-deployment.md
for clarity; added multi-host deployment details
User docs index: Reorganized into logical sections (Deployment, Input & Storage, Query &
Analytics, Core, Datasets, Reference)

Removed documentation

core-overview.md, guides-overview.md, reference-overview.md: Content consolidated into the
restructured index

Checklist

The PR satisfies the contribution guidelines.
This is a breaking change and that has been indicated in the PR title, OR this isn't a
breaking change.
Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Verified all internal documentation links resolve correctly
Confirmed consistency between Helm chart values.yaml and documented configuration options
Verified documentation follows the test scripts (set-up-test.sh, set-up-multi-dedicated-test.sh,
set-up-multi-shared-test.sh)
Cross-referenced quick-start guides with guides-k8s-deployment.md for consistency

Summary by CodeRabbit

Documentation
- Added a comprehensive Kubernetes (Helm) deployment guide with install, verification, configuration and debugging guidance.
- Expanded Docker Compose guidance into a dedicated deployment guide with single-/multi-host options and new monitoring/debugging tips.
- Reworked quick-starts and top-level navigation into tabbed Docker Compose/Kubernetes flows and reorganized site sections (Deployment, Quick start, Input & storage, Query & analytics, Core, Reference).
- Clarified orchestration methods, secrets handling, storage/mounts and service start sequencing; updated troubleshooting and stop/start guidance.
- Removed several outdated overview/landing pages and updated cross‑references and links.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…with Helm.

coderabbitai · 2025-12-19T21:52:06Z

Warning

Rate limit exceeded

@junhaoliao has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 26 minutes and 16 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 03b1101 and 131f609.

📒 Files selected for processing (6)

docs/src/dev-docs/building-package.md
docs/src/user-docs/core-unstructured/index.md
docs/src/user-docs/guides-k8s-deployment.md
docs/src/user-docs/index.md
docs/src/user-docs/quick-start/clp-text.md
docs/src/user-docs/reference-sbin-scripts/index.md

Walkthrough

Restructures CLP documentation to support two orchestration paths—Docker Compose for single-host use and Kubernetes via Helm for cluster deployments. Adds a full Kubernetes guide, reworks quick-starts into per-flavour tabs, reorganizes the docs index, updates cross-references, and removes several legacy overview pages.

Changes

Cohort / File(s)	Summary
Documentation Index Restructuring `docs/src/user-docs/index.md`	Reorganized site into new top-level sections (Quick start, Deployment, Input & storage, Query & analytics, Core, Resources/Datasets, Reference) with grid/card layouts and updated toctrees.
Quick-start Guides — Tabbed Deployment Options `docs/src/user-docs/quick-start/index.md`, `docs/src/user-docs/quick-start/clp-json.md`, `docs/src/user-docs/quick-start/clp-text.md`	Introduced Docker Compose and Kubernetes (kind/Helm) tabs; added per-flavour prerequisites, environment setup, Helm/kind instructions, per-mode UI/API access, and stop/uninstall workflows.
Deployment Orchestration Design → Dual Paths `docs/src/dev-docs/design-deployment-orchestration.md`	Replaced single-method guidance with dual orchestration paths (Docker Compose and Helm/Kubernetes); added orchestration methods, config interfaces, secrets, dependencies, storage/mounts, start/stop, and troubleshooting sections.
Docker Compose User Guide (renamed/expanded) `docs/src/user-docs/guides-docker-compose-deployment.md`	Renamed multi-host guide to Docker Compose deployment; added deployment options (single/multi-host), monitoring and debugging commands, instance-id guidance, and links to Kubernetes guide.
New Kubernetes Deployment Guide `docs/src/user-docs/guides-k8s-deployment.md`	Added comprehensive Helm/Kubernetes guide covering requirements, kind vs production clusters, Helm chart install/values, storage/PVC notes, verification, monitoring/debugging, advanced scheduling, and release management.
Deleted Legacy Overview Pages `docs/src/user-docs/core-overview.md`, `docs/src/user-docs/guides-overview.md`, `docs/src/user-docs/reference-overview.md`	Removed legacy overview pages; their navigation/content migrated into the restructured index and dedicated guides.
Cross-References & Minor Docs Edits `docs/src/user-docs/guides-external-database.md`, `components/core/README.md`, `docs/src/user-docs/reference-sbin-scripts/index.md`	Updated anchors and link targets to renamed pages (docker-compose-deployment, guides-docker-compose-deployment, guides-k8s-deployment); adjusted README link to `index#core`; formatted `sbin` in heading.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main changes: adding Kubernetes Helm documentation and restructuring user documentation.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…update orchestration links

…pose and Kubernetes orchestration details.

# Conflicts: # docs/src/user-docs/index.md

quinntaylormitchell · 2026-01-12T17:33:06Z

docs/src/user-docs/index.md

I think that splitting the guides section into Deployment, Input & storage, etc. could be good, but I don't think that we then should merge the content of guides-overview.md with the content of this file, and then delete guides-overview.md. (That would mean making a link card for each of the sections.) It doesn't make sense to have a Guides section whose sole purpose is to direct the reader to other sections.

quinntaylormitchell · 2026-01-12T17:35:32Z

docs/src/user-docs/guides-overview.md

See my comment on docs/src/user-docs/index.md re. what should be done with this file.

the guides-overview.md was removed as a result

quinntaylormitchell · 2026-01-12T17:59:21Z

docs/src/user-docs/quick-start/index.md

How would you feel about standardizing the way we present names of programs/systems that don't start with a capital letter? I feel like it's hard when reading to see a program name like kind as plaintext and remember that it's a program name and not a word. Even for un-capitalized program names that aren't real words (e.g., kubeadm) it would be nice to monospace it. Programs/systems with capitalized names like Docker and Kubernetes are fine as plaintext imo.

Good catch. let me try enforcing these as you proposed:

Use backticks for lowercase tool names: kind, kubeadm, kubectl, helm (when we specifically refer to the CLI)

Keep capitalized names as plaintext: Docker, Kubernetes, Helm (when referring to the project/company rather than the CLI command)

…ing guides, core, and reference sections into a unified overview.

…eployment, core usage, resources, and reference sections.

# Conflicts: # docs/src/user-docs/reference-overview.md

…ompose, Kubernetes, and multi-host setups.

…sed code at y-scope#1829

coderabbitai

Actionable comments posted: 14

🤖 Fix all issues with AI agents

In @docs/src/dev-docs/design-deployment-orchestration.md:
- Around line 245-258: The "Spider" entries in the "Deployment types" section
are ambiguous; update the table or adjacent note under the "Deployment types"
header to explicitly state that Spider deployments (Spider Base and Spider Full)
are supported only for Docker Compose and are not supported for Helm; modify the
note that currently reads "Spider support is not yet available for Helm." to
something like "Spider deployments are supported for Docker Compose only; Helm
support for Spider is not yet available," and ensure the table rows for "Spider
Base" and "Spider Full" remain but align with this clarified note.
- Around line 265-271: Update the "Troubleshooting" section to add one or two
succinct sentences before the links that describe typical issues and when to use
each guide—mention that [docker-compose-debugging] is for local/service startup,
container logs, networking, and dependency issues in single-host setups, and
that [kubernetes-debugging] is for cluster-level problems like pod crashes,
scheduling, ingress, and Helm releases; reference the "Troubleshooting" heading
and the link tokens [docker-compose-debugging] and [kubernetes-debugging] so the
added context sits directly above those links.
- Around line 192-218: Clarify that today Kubernetes deployments are configured
directly via the Helm chart using values.yaml and Kubernetes Secrets/ConfigMaps
(users should place container-adapted settings and credentials into values.yaml
or create Secrets rather than relying on etc/clp-config.yaml), and update the
note to state that in the future a planned KubernetesController (like
DockerComposeController) will read clp-config.yaml and credentials.yaml and
automatically generate the Helm values/ConfigMaps/Secrets to unify the workflow;
reference KubernetesController, DockerComposeController, clp-config.yaml,
credentials.yaml, and values.yaml so readers know what to use now versus what
will be automated later.

In @docs/src/user-docs/guides-k8s-deployment.md:
- Around line 163-210: The docs currently show weak example credentials
(CLP_DB_PASS, CLP_DB_ROOT_PASS, CLP_QUEUE_PASS, CLP_REDIS_PASS) with only a
small “(change these for production)” note; add a prominent warning block
immediately before the credentials export block that calls out using strong,
unique passwords for production and suggests secure generation/storage options
(e.g., password manager or openssl rand -hex), and update the inline comment
near the credential exports to reinforce the warning so readers won’t copy the
example values.
- Around line 224-276: The YAML example has duplicate clpConfig.archive_output
keys which will cause the second to overwrite the first; merge the settings into
a single clpConfig.archive_output block containing target_archive_size,
compression_level, and retention_period (and remove the extra archive_output
section) so the example uses one archive_output under clpConfig with all three
fields present.

In @docs/src/user-docs/index.md:
- Around line 70-75: Update the grid-item-card title text "External database
setup" to a grammatically precise alternative such as "Setting up external
databases" (or "External database configuration") by editing the card content
where the string appears so it matches the style of other card titles and
pluralization.

In @docs/src/user-docs/quick-start/clp-json.md:
- Around line 78-80: Clarify that the hostPath/containerPath mount under the
"Mount for logs input" section is optional only when users supply logs via
object storage (e.g., S3), and update the text to either link to the object
storage guide or add a short note describing when to omit the mount (e.g., when
using S3/GCS inputs, no hostPath is required) and when to keep it (local file
ingestion or local testing); reference the "Mount for logs input" heading and
the hostPath/containerPath YAML keys so the update is applied adjacent to those
lines.
- Around line 117-120: Update the note about allowHostAccessForSbinScripts to
explain its security implications: change wording to advise keeping
allowHostAccessForSbinScripts set to true (the default) to enable sbin scripts,
and add a brief caution that this allows containers to access host
paths/resources and should be reviewed or restricted for production deployments.
Reference the configuration key allowHostAccessForSbinScripts and ensure the
note mentions both the behavior (allows host path access) and the recommended
review for production.
- Around line 54-58: Add a security note above the credential generation block
that warns these export commands (export CLP_DB_PASS, export CLP_DB_ROOT_PASS,
export CLP_QUEUE_PASS, export CLP_REDIS_PASS) will be stored in shell history
and provide brief mitigation steps: advise prefixing commands with a space if
HISTCONTROL=ignorespace, or temporarily disabling history with set +o history
(and re-enabling with set -o history); place the note immediately before the
existing "export CLP_DB_PASS=..." lines so readers see the warning before
generating credentials.
- Around line 146-153: Move the `etc/clp-config.yaml` configuration note so
users see it before any compression commands: place the existing "Configure
etc/clp-config.yaml to connect to the kind-deployed database" block immediately
above the "Compressing JSON logs" compress command (or add a clear workflow
callout labeled e.g. "Before running compression commands, configure
etc/clp-config.yaml" at the start of the "Compressing JSON logs" section), and
ensure the note appears in the Kubernetes tab as well so users following that
flow must configure the file before running compression.
- Around line 353-361: Replace the hardcoded port 30301 in the example curl
command with a reference to the CLP_API_SERVER_PORT variable or add a short note
clarifying the default; specifically update the curl URL
"http://localhost:30301/query/submit" to use
"http://localhost:${CLP_API_SERVER_PORT}/query/submit" (or add a one-line note
stating “The default API server port is 30301; if you changed
CLP_API_SERVER_PORT use that value”) so examples remain consistent with the
CLP_API_SERVER_PORT setting.
- Around line 378-385: The documentation repeats the same database port snippet
for etc/clp-config.yaml (the block showing "database: port: 30306" and
"results_cache: port: 30017") in multiple places; consolidate by keeping the
configuration only at the first mention and replace subsequent duplicates with a
short pointer or note, or add a tip immediately after the first "database: port:
30306" example stating that this setting applies globally (e.g., "This
configuration applies to all subsequent operations; you only need to set it
once") and update the later sections to reference that tip instead of repeating
the YAML.
- Around line 443-449: Update the two inconsistent reference links so they
include the .md extension: change the link target labeled [datasets] from
"../resources-datasets" to "../resources-datasets.md" and change the link target
labeled [object-storage] from "../guides-using-object-storage/index" to
"../guides-using-object-storage/index.md" to match the other reference link
formats in this block.

In @docs/src/user-docs/quick-start/clp-text.md:
- Around line 287-304: The API server curl example hard-codes
http://localhost:30301 which varies by deployment; update the docs around the
"Searching via the API server" example to provide deployment-specific guidance:
add separate notes or tabs for Docker Compose (use http://localhost:3001 and
mention CLP_API_SERVER_PORT env var) and Kubernetes (explain to use the NodePort
from clpConfig.api_server.port in Helm), or at minimum add a prominent sentence
telling users to adjust the host:port based on their deployment configuration
(reference the CLP_API_SERVER_PORT and clpConfig.api_server.port symbols).

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db5866a and 105d96d.

📒 Files selected for processing (12)

docs/requirements.txt
docs/src/dev-docs/design-deployment-orchestration.md
docs/src/user-docs/core-overview.md
docs/src/user-docs/guides-docker-compose-deployment.md
docs/src/user-docs/guides-external-database.md
docs/src/user-docs/guides-k8s-deployment.md
docs/src/user-docs/guides-overview.md
docs/src/user-docs/index.md
docs/src/user-docs/quick-start/clp-json.md
docs/src/user-docs/quick-start/clp-text.md
docs/src/user-docs/quick-start/index.md
docs/src/user-docs/reference-overview.md

💤 Files with no reviewable changes (3)

docs/src/user-docs/core-overview.md
docs/src/user-docs/reference-overview.md
docs/src/user-docs/guides-overview.md

🧰 Additional context used

🧠 Learnings (11)

📚 Learning: 2025-12-19T05:03:42.629Z

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1818
File: tools/deployment/package-helm/templates/configmap.yaml:12-12
Timestamp: 2025-12-19T05:03:42.629Z
Learning: In the y-scope/clp Helm chart, the `api_server.host` configuration field in the ConfigMap (tools/deployment/package-helm/templates/configmap.yaml) is unused because no other k8s internal services need to reach the API server—it's only accessed from outside the cluster via NodePort.

Applied to files:

docs/src/user-docs/guides-k8s-deployment.md

📚 Learning: 2025-06-18T20:39:05.899Z

Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:73-109
Timestamp: 2025-06-18T20:39:05.899Z
Learning: The CLP project team prefers to use video content to demonstrate detailed procedural steps (like tarball extraction) rather than including every step in the written documentation, keeping the docs focused on conceptual guidance.

Applied to files:

docs/src/user-docs/index.md
docs/src/user-docs/quick-start/clp-json.md
docs/src/user-docs/quick-start/clp-text.md
docs/src/user-docs/quick-start/index.md
docs/src/dev-docs/design-deployment-orchestration.md

📚 Learning: 2025-06-18T20:48:48.990Z

Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 968
File: docs/src/user-guide/quick-start/overview.md:53-54
Timestamp: 2025-06-18T20:48:48.990Z
Learning: CLP is designed to run on Linux systems where Python is typically pre-installed, so Python installation links are generally not needed in CLP documentation.

Applied to files:

docs/src/user-docs/index.md
docs/src/user-docs/quick-start/index.md

📚 Learning: 2025-09-25T05:13:13.298Z

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1178
File: components/clp-package-utils/clp_package_utils/controller.py:217-223
Timestamp: 2025-09-25T05:13:13.298Z
Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Applied to files:

docs/src/user-docs/quick-start/clp-json.md
docs/src/user-docs/quick-start/clp-text.md
docs/src/dev-docs/design-deployment-orchestration.md

📚 Learning: 2025-01-16T16:58:43.190Z

Learnt from: haiqi96
Repo: y-scope/clp PR: 651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.

Applied to files:

docs/src/user-docs/quick-start/clp-json.md

📚 Learning: 2025-09-15T22:20:40.750Z

Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1125
File: components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py:267-291
Timestamp: 2025-09-15T22:20:40.750Z
Learning: For CLP compression jobs, the team has decided to fail the entire job immediately upon encountering any invalid input path, rather than continuing to process valid paths. This decision was made during PR #1125 development.

Applied to files:

docs/src/user-docs/quick-start/clp-json.md

📚 Learning: 2025-12-04T03:31:55.239Z

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1690
File: tools/deployment/package/docker-compose-all.yaml:424-427
Timestamp: 2025-12-04T03:31:55.239Z
Learning: In tools/deployment/package/docker-compose-all.yaml, the query-worker service writes to /var/data/streams (CLP_STREAM_OUTPUT_DIR_HOST mount), so this directory must remain read-write and should not be mounted with :ro flag.

Applied to files:

docs/src/user-docs/quick-start/clp-text.md

📚 Learning: 2025-08-15T21:48:40.228Z

Learnt from: quinntaylormitchell
Repo: y-scope/clp PR: 1181
File: docs/src/user-guide/guides-retention.md:68-73
Timestamp: 2025-08-15T21:48:40.228Z
Learning: In documentation for the CLP project, when suggesting formatting improvements for variables in explanatory text, the user quinntaylormitchell prefers to maintain existing sentence structures (like keeping "that" in "i.e., that the difference...") while applying monospace formatting to technical terms and variables for consistency.

Applied to files:

docs/src/user-docs/quick-start/index.md

📚 Learning: 2025-10-22T21:14:12.225Z

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1466
File: .github/workflows/clp-rust-checks.yaml:14-15
Timestamp: 2025-10-22T21:14:12.225Z
Learning: Repository y-scope/clp: In GitHub Actions workflows (e.g., .github/workflows/clp-rust-checks.yaml), YAML anchors/aliases are acceptable and preferred to avoid duplication; if actionlint flags an alias node (e.g., on push.paths) as an error, treat it as a tool limitation and do not require inlining unless the team asks to silence the warning.

Applied to files:

docs/src/user-docs/quick-start/index.md

📚 Learning: 2025-10-27T07:07:37.901Z

Learnt from: junhaoliao
Repo: y-scope/clp PR: 1501
File: tools/deployment/presto-clp/scripts/init.py:10-13
Timestamp: 2025-10-27T07:07:37.901Z
Learning: In `tools/deployment/presto-clp/scripts/init.py`, the `DATABASE_COMPONENT_NAME` and `DATABASE_DEFAULT_PORT` constants are intentionally duplicated from `clp_py_utils.clp_config` because `clp_py_utils` is not installed in the Presto init script's runtime environment. The two flows are separate and this duplication is documented. There are plans to merge these flows after a future release.

Applied to files:

docs/src/dev-docs/design-deployment-orchestration.md

📚 Learning: 2025-08-25T16:27:50.549Z

Learnt from: davemarco
Repo: y-scope/clp PR: 1198
File: components/webui/server/src/plugins/app/Presto.ts:38-43
Timestamp: 2025-08-25T16:27:50.549Z
Learning: In the CLP webui Presto configuration, host and port are set via package settings (configurable), while user, catalog, and schema are set via environment variables (environment-specific). This mixed approach is intentional - settings are typically set by package and some values don't need to be package-configurable.

Applied to files:

docs/src/dev-docs/design-deployment-orchestration.md

🪛 LanguageTool

docs/src/user-docs/quick-start/index.md

[grammar] ~13-~13: The verb form ‘guide’ does not appear to fit in this context.
Context: ...--- ## System Requirements This quick start guide covers single-host deployment using...

(SINGULAR_NOUN_VERB_AGREEMENT)

🪛 markdownlint-cli2 (0.18.1)

docs/src/user-docs/quick-start/clp-text.md

173-173: Link and image reference definitions should be needed
Unused link or image reference definition: "docker-non-root"

(MD053, link-image-reference-definitions)

🔇 Additional comments (18)

docs/requirements.txt (1)

6-6: Remove the upper bound constraint or clarify its purpose.

The version constraint <2.0.0 added to sphinxcontrib-mermaid does not address any actual breaking changes in v2.0.0, as this version does not exist on PyPI. The latest releases are in the 0.x or 1.x series.

If this constraint was added as a precautionary measure for a hypothetical future release, consider documenting that intent inline. Otherwise, if the constraint is unnecessary, remove it to allow the dependency resolver more flexibility.

Verify the actual latest stable version of sphinxcontrib-mermaid and confirm whether this upper bound serves a genuine purpose for your project's compatibility.

Likely an incorrect or invalid review comment.

docs/src/user-docs/guides-docker-compose-deployment.md (2)

1-26: LGTM! Clear documentation restructuring.

The rename from "Multi-host deployment" to "Docker Compose deployment" better reflects the content, and the new "Deployment options" subsection provides helpful context about single-host vs. multi-host setups. The updated note about manual orchestration is also clearer.

324-353: LGTM! Helpful debugging additions.

The new "Monitoring and debugging" section provides practical commands for troubleshooting Docker Compose deployments. The instance ID instructions and example commands are clear and useful.

docs/src/user-docs/guides-k8s-deployment.md (3)

45-104: LGTM! Comprehensive kubeadm setup instructions.

The kubeadm cluster setup guide is well-structured and includes:

Proper CIDR configuration with helpful notes

CNI plugin installation (Cilium)

Worker node joining instructions

Token regeneration guidance

The technical details are accurate and the warnings about CIDR conflicts are helpful.

377-409: LGTM! Comprehensive configuration reference.

The configuration options table provides a clear reference for commonly used Helm values with appropriate defaults and descriptions. This is very helpful for users.

19-19: No action required — Helm version requirement is correct.

The requirement specifying Helm >= 4.0 is accurate. Helm 4 is the current stable release line as of January 2026, following CNCF/Helm's announcement on November 12, 2025. The documentation aligns with the latest stable version.

Likely an incorrect or invalid review comment.
docs/src/user-docs/quick-start/clp-text.md (2)
54-148: LGTM! Clear Kubernetes setup instructions.

The Kubernetes (kind) setup tab provides comprehensive instructions including:

Environment variable setup

Directory creation

kind cluster configuration with proper mounts

Helm chart installation with clp-text configuration

The structure mirrors the Docker Compose approach well, making it easy for users to choose their preferred orchestration method.

383-386: Remove unused link reference.

The link reference [docker-non-root] at line 386 is defined but never used in the document.
🧹 Proposed fix

Remove the unused reference:
 [api-server]: ../guides-using-the-api-server.md
 [datasets]: ../resources-datasets
 [docker-compose-deployment]: ../guides-docker-compose-deployment.md
-[docker-non-root]: ...
 [k8s-deployment]: ../guides-k8s-deployment.md
 [text-search-syntax]: ../reference-text-search-syntax.md
Note: The actual line for docker-non-root isn't shown in the provided code, but based on the static analysis hint, it should be removed if unused.
Likely an incorrect or invalid review comment.
docs/src/user-docs/quick-start/index.md (1)

13-72: LGTM! Clear requirements organization.

The tab-set approach for system requirements effectively separates Docker Compose and Kubernetes (kind) requirements. The version checks and installation notes are helpful.

Note: The Helm version requirement (>= 4.0) should be verified as mentioned in the k8s-deployment guide review.
docs/src/user-docs/guides-external-database.md (1)
208-208: Fix incorrect relative path.

The relative path ../user-docs/guides-docker-compose-deployment.md is incorrect. Since this file is already in docs/src/user-docs/, the ../user-docs/ prefix will resolve to docs/src/user-docs/user-docs/, which doesn't exist.
🔗 Proposed fix
-[docker-compose-orchestration]: ../user-docs/guides-docker-compose-deployment.md
+[docker-compose-orchestration]: guides-docker-compose-deployment.md
⛔ Skipped due to learnings
Learnt from: Bill-hbrhbr
Repo: y-scope/clp PR: 1261
File: docs/src/dev-guide/components-core/manylinux-2-28-deps-install.md:24-24
Timestamp: 2025-08-25T06:32:48.313Z
Learning: In the CLP project documentation, when linking to scripts or other files within the repository, use relative paths (e.g., components/core/tools/scripts/...) rather than commit-pinned GitHub URLs to ensure docs and referenced files always belong to the same commit and stay synchronized.
docs/src/user-docs/index.md (4)

6-20: LGTM: Clear quick-start section.

The Quick start section provides a clear entry point for new users with appropriate context and navigation.

22-45: LGTM: Well-structured deployment section.

The Deployment section clearly distinguishes between Docker Compose and Kubernetes orchestration methods, with helpful descriptions for each deployment approach.

224-285: All toctree file references are valid and complete.

Verification confirms that all 21 toctree entries across the 7 sections reference existing files. No missing files or broken references were identified.

188-222: Fix backtick consistency in the last grid-item-card entry.

The link targets are correct and descriptions accurately reflect the file content. However, the last grid-item-card description uses "clp's" without backticks, while all other tool name references are consistently wrapped in backticks. Update "clp's schema file" to "clp's schema file" for consistency.

docs/src/dev-docs/design-deployment-orchestration.md (3)

3-6: LGTM: Clear scope expansion for dual orchestration.

The introduction effectively establishes that CLP now supports both Docker Compose and Kubernetes (via Helm) orchestration, setting appropriate context for the document.

277-284: All documentation links are valid.

All referenced user documentation files exist and contain the expected anchors:

guides-docker-compose-deployment.md#monitoring-and-debugging ✓

guides-k8s-deployment.md#monitoring-and-debugging ✓

guides-docker-compose-deployment.md#multi-host-deployment ✓

236-243: Clarify that local-storage with ReadWriteMany access mode has significant limitations requiring alternative StorageClasses for production.

The current documentation claims that Kubernetes uses local-storage StorageClass for shared PVCs with ReadWriteMany access mode. However, the Helm chart's actual implementation creates local PersistentVolumes (using hostPath binding) which are node-local and inherently incompatible with ReadWriteMany semantics. This configuration only functions correctly when all services accessing shared archives and streams are scheduled on the same node (control-plane in the current setup).

For production deployments spanning multiple nodes, users must configure network-based StorageClasses (e.g., NFS, Ceph) that support ReadWriteMany. The documentation should explicitly:

Note that the default local-storage configuration is suitable only for single-node or tightly-coupled deployments

Explain the incompatibility between local storage and ReadWriteMany access mode

Provide guidance on configuring alternative StorageClasses for production multi-node environments

docs/src/user-docs/quick-start/clp-json.md (1)

61-66: Directory structure is complete.

All directories referenced in the Helm chart's volume mounts are already included in the mkdir command. Cross-referencing the Helm templates confirms that all required subdirectories under data_directory, logs_directory, and tmp_directory are properly created by the command.

coderabbitai · 2026-01-13T19:28:44Z

docs/src/dev-docs/design-deployment-orchestration.md

+## Orchestration methods

-The orchestration code is split up into:
+CLP supports two orchestration methods: Docker Compose for single-host or manual multi-host
+deployments, and Helm for Kubernetes deployments. Both methods share the same configuration
+interface (`clp-config.yaml` and `credentials.yaml`) and support the same deployment types.

-* `BaseController` that defines:
-  * common logic for preparing the environment variables, configuration files, and directories
-    necessary for each service.
-  * abstract methods that orchestrator-specific derived classes must implement in order to
-    orchestrate a deployment.
-* `<Orchestrator>Controller` that implements (and/or overrides) any of the methods in
-  `BaseController` (`<Orchestrator>` is a placeholder for the specific orchestrator for which the
-  class is being implemented).
+### Configuration

-## Docker Compose orchestration
+Each service requires configuration values passed through config files, environment variables,
+and/or command line arguments. Since services run in containers, some values must be adapted for the
+orchestration environment—specifically, host paths must be converted to container paths, and
+hostnames/ports must use service discovery mechanisms.

-This section explains how we use Docker Compose to orchestrate the CLP package and is broken into
-the following subsections:
+The orchestration controller (e.g., `DockerComposeController`) reads `etc/clp-config.yaml` and
+`etc/credentials.yaml`, then generates:
+* A container-specific CLP config file with adapted paths and service names
+* Runtime configuration (environment variables or ConfigMaps)
+* Required directories (e.g., data output directories)

-* [Setting up the Docker Compose project's environment](#setting-up-the-environment)
-* [Starting and stoping the Docker Compose project](#starting-and-stopping-the-project)
-* [Deployment types](#deployment-types)
-* [Implementation details](#implementation-details)
-* [Troubleshooting](#troubleshooting)
+For Docker Compose, this generates `var/log/.clp-config.yaml` and `.env`. For Kubernetes, the Helm
+chart generates a ConfigMap and Secrets from `values.yaml`.

-### Setting up the environment
-
-Several services require configuration values to be passed in through the CLP package's config file,
-environment variables, and/or command line arguments. Since the services are running in containers,
-some of these configuration values need to be modified for the orchestration environment.
-Specifically:
+:::{note}
+A `KubernetesController` is also planned that will read `clp-config.yaml` and `credentials.yaml`
+like `DockerComposeController`, then set up the Helm release accordingly. This will unify the
+configuration experience across both orchestration methods.
+:::


🧹 Nitpick | 🔵 Trivial

Clarify the configuration flow for Kubernetes.

The note about a planned KubernetesController (lines 215-218) may confuse readers. Since the Helm chart currently uses values.yaml, explain how users should configure Kubernetes deployments today versus the future unified approach.

📝 Suggested clarification

For Docker Compose, this generates `var/log/.clp-config.yaml` and `.env`. For Kubernetes, the Helm chart generates a ConfigMap and Secrets from `values.yaml`. :::{note} +**Current approach**: For Kubernetes deployments, configure the Helm chart via `values.yaml` +or `--set` flags as shown in the [Kubernetes deployment guide][kubernetes-guide]. + +**Future enhancement**: A `KubernetesController` is also planned that will read `clp-config.yaml` and `credentials.yaml` like `DockerComposeController`, then set up the Helm release accordingly. This will unify the configuration experience across both orchestration methods. :::

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## Orchestration methods

The orchestration code is split up into:

CLP supports two orchestration methods: Docker Compose for single-host or manual multi-host

deployments, and Helm for Kubernetes deployments. Both methods share the same configuration

interface (`clp-config.yaml` and `credentials.yaml`) and support the same deployment types.

* `BaseController` that defines:

* common logic for preparing the environment variables, configuration files, and directories

necessary for each service.

* abstract methods that orchestrator-specific derived classes must implement in order to

orchestrate a deployment.

* `<Orchestrator>Controller` that implements (and/or overrides) any of the methods in

`BaseController` (`<Orchestrator>` is a placeholder for the specific orchestrator for which the

class is being implemented).

### Configuration

## Docker Compose orchestration

Each service requires configuration values passed through config files, environment variables,

and/or command line arguments. Since services run in containers, some values must be adapted for the

orchestration environment—specifically, host paths must be converted to container paths, and

hostnames/ports must use service discovery mechanisms.

This section explains how we use Docker Compose to orchestrate the CLP package and is broken into

the following subsections:

The orchestration controller (e.g., `DockerComposeController`) reads `etc/clp-config.yaml` and

`etc/credentials.yaml`, then generates:

* A container-specific CLP config file with adapted paths and service names

* Runtime configuration (environment variables or ConfigMaps)

* Required directories (e.g., data output directories)

* [Setting up the Docker Compose project's environment](#setting-up-the-environment)

* [Starting and stoping the Docker Compose project](#starting-and-stopping-the-project)

* [Deployment types](#deployment-types)

* [Implementation details](#implementation-details)

* [Troubleshooting](#troubleshooting)

For Docker Compose, this generates `var/log/.clp-config.yaml` and `.env`. For Kubernetes, the Helm

chart generates a ConfigMap and Secrets from `values.yaml`.

### Setting up the environment

Several services require configuration values to be passed in through the CLP package's config file,

environment variables, and/or command line arguments. Since the services are running in containers,

some of these configuration values need to be modified for the orchestration environment.

Specifically:

:::{note}

A `KubernetesController` is also planned that will read `clp-config.yaml` and `credentials.yaml`

like `DockerComposeController`, then set up the Helm release accordingly. This will unify the

configuration experience across both orchestration methods.

:::

## Orchestration methods

CLP supports two orchestration methods: Docker Compose for single-host or manual multi-host

deployments, and Helm for Kubernetes deployments. Both methods share the same configuration

interface (`clp-config.yaml` and `credentials.yaml`) and support the same deployment types.

### Configuration

Each service requires configuration values passed through config files, environment variables,

and/or command line arguments. Since services run in containers, some values must be adapted for the

orchestration environment—specifically, host paths must be converted to container paths, and

hostnames/ports must use service discovery mechanisms.

The orchestration controller (e.g., `DockerComposeController`) reads `etc/clp-config.yaml` and

`etc/credentials.yaml`, then generates:

* A container-specific CLP config file with adapted paths and service names

* Runtime configuration (environment variables or ConfigMaps)

* Required directories (e.g., data output directories)

For Docker Compose, this generates `var/log/.clp-config.yaml` and `.env`. For Kubernetes, the Helm

chart generates a ConfigMap and Secrets from `values.yaml`.

:::{note}

**Current approach**: For Kubernetes deployments, configure the Helm chart via `values.yaml`

or `--set` flags as shown in the [Kubernetes deployment guide][kubernetes-guide].

**Future enhancement**:

A `KubernetesController` is also planned that will read `clp-config.yaml` and `credentials.yaml`

like `DockerComposeController`, then set up the Helm release accordingly. This will unify the

configuration experience across both orchestration methods.

:::

🤖 Prompt for AI Agents

In @docs/src/dev-docs/design-deployment-orchestration.md around lines 192 - 218, Clarify that today Kubernetes deployments are configured directly via the Helm chart using values.yaml and Kubernetes Secrets/ConfigMaps (users should place container-adapted settings and credentials into values.yaml or create Secrets rather than relying on etc/clp-config.yaml), and update the note to state that in the future a planned KubernetesController (like DockerComposeController) will read clp-config.yaml and credentials.yaml and automatically generate the Helm values/ConfigMaps/Secrets to unify the workflow; reference KubernetesController, DockerComposeController, clp-config.yaml, credentials.yaml, and values.yaml so readers know what to use now versus what will be automated later.

coderabbitai · 2026-01-13T19:28:44Z

docs/src/dev-docs/design-deployment-orchestration.md

+### Deployment types

-One notable implementation detail is in how we handle mounts that are only necessary under certain
-configurations. For instance, the input logs mount is only necessary when the `logs_input.type` is
-`fs`. If `logs_input.type` is `s3`, we shouldn't mount some random directory from the user's
-host filesystem into the container. However, Docker doesn't provide a mechanism to perform
-conditional mounts. Instead, we use Docker's variable interpolation to conditionally mount an empty
-tmpfs mount into the container. This strategy is used wherever we need a conditional mount.
+CLP supports multiple deployment configurations based on the compression scheduler and query engine.

-### Troubleshooting
+| Deployment Type | Compression Scheduler | Query Engine                 |
+|-----------------|-----------------------|------------------------------|
+| Base            | Celery                | [Presto][presto-integration] |
+| Full            | Celery                | Native                       |
+| Spider Base     | Spider                | [Presto][presto-integration] |
+| Spider Full     | Spider                | Native                       |

-If you encounter issues with the Docker Compose deployment, first determine the instance ID for your
-deployment by checking the content of `<clp-package>/var/log/instance-id`. Then run one of the
-commands below as necessary.
+:::{note}
+Spider support is not yet available for Helm.
+:::


🛠️ Refactor suggestion | 🟠 Major

Clarify Spider deployment status.

The deployment types table includes Spider Base and Spider Full, but the note states "Spider support is not yet available for Helm." This may confuse users about whether they can use Spider at all. Consider clarifying that Spider is available for Docker Compose only.

♻️ Suggested clarification

### Deployment types CLP supports multiple deployment configurations based on the compression scheduler and query engine. | Deployment Type | Compression Scheduler | Query Engine | |-----------------|-----------------------|------------------------------| | Base | Celery | [Presto][presto-integration] | | Full | Celery | Native | | Spider Base | Spider | [Presto][presto-integration] | | Spider Full | Spider | Native | :::{note} -Spider support is not yet available for Helm. +Spider-based deployment types (Spider Base and Spider Full) are currently only supported +for Docker Compose deployments. Kubernetes/Helm support for Spider is planned for a future release. :::

🤖 Prompt for AI Agents

In @docs/src/dev-docs/design-deployment-orchestration.md around lines 245 - 258, The "Spider" entries in the "Deployment types" section are ambiguous; update the table or adjacent note under the "Deployment types" header to explicitly state that Spider deployments (Spider Base and Spider Full) are supported only for Docker Compose and are not supported for Helm; modify the note that currently reads "Spider support is not yet available for Helm." to something like "Spider deployments are supported for Docker Compose only; Helm support for Spider is not yet available," and ensure the table rows for "Spider Base" and "Spider Full" remain but align with this clarified note.

coderabbitai · 2026-01-13T19:28:44Z

docs/src/dev-docs/design-deployment-orchestration.md

+## Troubleshooting

-2. View service logs:
+When issues arise, use the appropriate commands for your orchestration method:

-   ```bash
-   docker compose --project-name clp-package-<instance-id> logs <service-name>
-   ```
+* [Docker Compose debugging][docker-compose-debugging]
+* [Kubernetes Helm debugging][kubernetes-debugging]



🧹 Nitpick | 🔵 Trivial

Add context for troubleshooting links.

The Troubleshooting section provides links to debugging guides but doesn't explain what kinds of issues users might encounter or when to consult each guide. Consider adding brief context.

📋 Suggested improvement

## Troubleshooting -When issues arise, use the appropriate commands for your orchestration method: +When services fail to start, crash, or exhibit unexpected behaviour, use the appropriate +debugging commands and techniques for your orchestration method: * [Docker Compose debugging][docker-compose-debugging] * [Kubernetes Helm debugging][kubernetes-debugging]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## Troubleshooting

2. View service logs:

When issues arise, use the appropriate commands for your orchestration method:

```bash

docker compose --project-name clp-package-<instance-id> logs <service-name>

```

* [Docker Compose debugging][docker-compose-debugging]

* [Kubernetes Helm debugging][kubernetes-debugging]

## Troubleshooting

When services fail to start, crash, or exhibit unexpected behaviour, use the appropriate

debugging commands and techniques for your orchestration method:

* [Docker Compose debugging][docker-compose-debugging]

* [Kubernetes Helm debugging][kubernetes-debugging]

🤖 Prompt for AI Agents

In @docs/src/dev-docs/design-deployment-orchestration.md around lines 265 - 271, Update the "Troubleshooting" section to add one or two succinct sentences before the links that describe typical issues and when to use each guide—mention that [docker-compose-debugging] is for local/service startup, container logs, networking, and dependency issues in single-host setups, and that [kubernetes-debugging] is for cluster-level problems like pod crashes, scheduling, ingress, and Helm releases; reference the "Troubleshooting" heading and the link tokens [docker-compose-debugging] and [kubernetes-debugging] so the added context sits directly above those links.

coderabbitai · 2026-01-13T19:28:44Z

docs/src/user-docs/guides-k8s-deployment.md

+### Basic installation
+
+Create the required directories on all worker nodes:
+
+```bash
+export CLP_HOME="/tmp/clp"
+
+mkdir -p "$CLP_HOME/var/data/"{archives,streams,staged-archives,staged-streams} \
+         "$CLP_HOME/var/log/"{compression_scheduler,compression_worker,user} \
+         "$CLP_HOME/var/log/"{query_scheduler,query_worker,reducer} \
+         "$CLP_HOME/var/tmp"
+```
+
+Then on the **control-plane node**, generate credentials and install CLP:
+
+```bash
+export CLP_HOME="/tmp/clp"
+
+mkdir -p "$CLP_HOME/var/"{data,log}/{database,queue,redis,results_cache} \
+         "$CLP_HOME/var/data/"{archives,streams,staged-archives,staged-streams} \
+         "$CLP_HOME/var/log/"{compression_scheduler,compression_worker,user} \
+         "$CLP_HOME/var/log/"{query_scheduler,query_worker,reducer} \
+         "$CLP_HOME/var/log/"{garbage_collector,api_server,log_ingestor,mcp_server} \
+         "$CLP_HOME/var/tmp"
+
+# Credentials (change these for production)
+export CLP_DB_PASS="pass"
+export CLP_DB_ROOT_PASS="root-pass"
+export CLP_QUEUE_PASS="pass"
+export CLP_REDIS_PASS="pass"
+
+# Worker replicas (increase for multi-node clusters)
+export CLP_COMPRESSION_WORKER_REPLICAS=1
+export CLP_QUERY_WORKER_REPLICAS=1
+
+helm install clp . \
+  --set clpConfig.data_directory="$CLP_HOME/var/data" \
+  --set clpConfig.logs_directory="$CLP_HOME/var/log" \
+  --set clpConfig.tmp_directory="$CLP_HOME/var/tmp" \
+  --set clpConfig.archive_output.storage.directory="$CLP_HOME/var/data/archives" \
+  --set clpConfig.stream_output.storage.directory="$CLP_HOME/var/data/streams" \
+  --set credentials.database.password="$CLP_DB_PASS" \
+  --set credentials.database.root_password="$CLP_DB_ROOT_PASS" \
+  --set credentials.queue.password="$CLP_QUEUE_PASS" \
+  --set credentials.redis.password="$CLP_REDIS_PASS" \
+  --set compressionWorker.replicas="$CLP_COMPRESSION_WORKER_REPLICAS" \
+  --set queryWorker.replicas="$CLP_QUERY_WORKER_REPLICAS"
+```


🧹 Nitpick | 🔵 Trivial

Consider emphasizing production credential requirements.

While the basic installation section is clear, lines 188-192 show example credentials (pass, root-pass) that users might copy directly. Although there's a comment "(change these for production)", this could be made more prominent.

📝 Suggested improvement

Consider adding a warning block before the credentials section:

:::{warning} The example below uses simple passwords for demonstration. **Always use strong, unique passwords for production deployments.** Consider using a password manager or `openssl rand -hex 16` to generate secure credentials. :::

🤖 Prompt for AI Agents

In @docs/src/user-docs/guides-k8s-deployment.md around lines 163 - 210, The docs currently show weak example credentials (CLP_DB_PASS, CLP_DB_ROOT_PASS, CLP_QUEUE_PASS, CLP_REDIS_PASS) with only a small “(change these for production)” note; add a prominent warning block immediately before the credentials export block that calls out using strong, unique passwords for production and suggests secure generation/storage options (e.g., password manager or openssl rand -hex), and update the inline comment near the credential exports to reinforce the warning so readers won’t copy the example values.

docs/src/user-docs/guides-k8s-deployment.md

docs/src/user-docs/quick-start/clp-json.md

coderabbitai · 2026-01-13T19:28:45Z

docs/src/user-docs/quick-start/clp-json.md

+curl -X POST "http://localhost:30301/query/submit" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "query_string": "<query>",
+    "max_num_results": 1000,
+    "timestamp_begin": null,
+    "timestamp_end": null,
+    "case_sensitive": false
+  }'


⚠️ Potential issue | 🟡 Minor

API server port hardcoded without variable reference.

The API server endpoint uses hardcoded port 30301, but earlier in the document (line 50), this port is set via CLP_API_SERVER_PORT environment variable. This creates potential inconsistency if users change the port. Consider referencing the variable or noting that the default port is shown.

🔧 Suggested fix

To search via the API server: ```bash -curl -X POST "http://localhost:30301/query/submit" \ +# Replace 30301 with your CLP_API_SERVER_PORT if you changed it +curl -X POST "http://localhost:${CLP_API_SERVER_PORT}/query/submit" \ -H "Content-Type: application/json" \ -d '{

Or add a note:

To search via the API server: +```{note} +The default API server port is 30301. If you changed `CLP_API_SERVER_PORT`, use that value instead. +``` + ```bash curl -X POST "http://localhost:30301/query/submit" \

🤖 Prompt for AI Agents

In @docs/src/user-docs/quick-start/clp-json.md around lines 353 - 361, Replace the hardcoded port 30301 in the example curl command with a reference to the CLP_API_SERVER_PORT variable or add a short note clarifying the default; specifically update the curl URL "http://localhost:30301/query/submit" to use "http://localhost:${CLP_API_SERVER_PORT}/query/submit" (or add a one-line note stating “The default API server port is 30301; if you changed CLP_API_SERVER_PORT use that value”) so examples remain consistent with the CLP_API_SERVER_PORT setting.

coderabbitai · 2026-01-13T19:28:45Z

docs/src/user-docs/quick-start/clp-json.md

+Configure `etc/clp-config.yaml` to connect to the `kind`-deployed services:
+
+```yaml
+database:
+  port: 30306
+results_cache:
+  port: 30017
+```


🧹 Nitpick | 🔵 Trivial

Duplicate database port configuration across sections.

The database port configuration is repeated in multiple sections (lines 149-151, 204-206, and 381-384). This creates maintenance burden and potential inconsistency. Consider consolidating these or adding a clear note that this is a one-time configuration.

Consider adding a tip after the first occurrence:

Configure `etc/clp-config.yaml` to connect to the `kind`-deployed database: ```yaml database: port: 30306

+{tip} +This configuration applies to all subsequent operations (compression, search, etc.). +You only need to set it once. +

<details> <summary>🤖 Prompt for AI Agents</summary>

In @docs/src/user-docs/quick-start/clp-json.md around lines 378 - 385, The
documentation repeats the same database port snippet for etc/clp-config.yaml
(the block showing "database: port: 30306" and "results_cache: port: 30017") in
multiple places; consolidate by keeping the configuration only at the first
mention and replace subsequent duplicates with a short pointer or note, or add a
tip immediately after the first "database: port: 30306" example stating that
this setting applies globally (e.g., "This configuration applies to all
subsequent operations; you only need to set it once") and update the later
sections to reference that tip instead of repeating the YAML.

</details>  

docs/src/user-docs/quick-start/clp-json.md

docs/src/user-docs/quick-start/clp-text.md

kirkrodrigues

@junhaoliao Made some edits based on our offline discussions, etc. Also added a comment. Please review.

kirkrodrigues · 2026-01-18T21:53:49Z

docs/src/user-docs/guides-k8s-deployment.md

+The following table lists commonly used Helm values. For a complete list, see
+[`values.yaml`][clp-helm-chart-values] in the chart directory.
+
+| Parameter                                    | Description                                    | Default                           |


This table feels a bit redundant and slightly confusing to me rather than simply reading values.yaml directly, which seems to have more detailed comments anyway. Not to mention, this table can easily go out of sync. Can we remove this table?

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/src/user-docs/quick-start/clp-json.md`:
- Around line 335-345: MD031 linting error: add a single blank line after the
closing code-fence(s) so the following container markers render correctly;
locate the affected triple-backtick fences (```bash) and insert one blank line
between each closing ``` and the next ':::' or '::::' tag (applies to both
Docker Compose and Kubernetes examples) so there is a blank line separating the
code block from the subsequent ::: markers.

♻️ Duplicate comments (10)

docs/src/user-docs/guides-k8s-deployment.md (3)
172-194: Consider clarifying directory creation requirements.

The instructions create directories on "all worker nodes" (lines 172-180), then separately on the "control-plane node" (lines 185-194). If the control-plane node also runs worker pods, it would need both sets of directories. Consider adding a note stating whether control-plane nodes require both directory sets, or consolidate the instructions with clear annotations about which paths are needed where.

517-533: Add host configuration to sbin script example.

The configuration example shows only port mappings but omits the required host fields. Users connecting via NodePort would need to specify a cluster node's IP or hostname:
📝 More complete example
database:
  host: "<node-ip-or-hostname>"  # IP/hostname of any cluster node
  port: 30306  # Match `clpConfig.database.port` in Helm values
results_cache:
  host: "<node-ip-or-hostname>"  # IP/hostname of any cluster node
  port: 30017  # Match `clpConfig.results_cache.port` in Helm values
Additionally, users would need to configure credentials in etc/credentials.yaml to match the Helm chart's credential values.
239-287: Fix duplicate YAML key in custom values example.

The example has two archive_output: keys under clpConfig (lines 260-262 and 270-271). In YAML, duplicate keys at the same level are invalid—the second occurrence will silently override the first. Merge them into a single archive_output: block containing all three fields:
🔧 Proposed fix
 clpConfig:
   # Use clp-text, instead of clp-json (default)
   package:
     storage_engine: "clp"
     query_engine: "clp"

-  # Configure archive output
-  archive_output:
-    target_archive_size: 536870912  # 512 MB
-    compression_level: 6
-
   # Enable MCP server
   mcp_server:
     port: 30800
     logging_level: "INFO"

-  # Configure data retention (in minutes)
+  # Configure archive output and data retention
   archive_output:
+    target_archive_size: 536870912  # 512 MB
+    compression_level: 6
     retention_period: 10080  # 7 days (in minutes)
docs/src/dev-docs/design-deployment-orchestration.md (2)
260-262: Clarify Spider deployment availability.

The note states Spider isn't available for Helm but doesn't explicitly confirm it works with Docker Compose. Consider rewording for clarity:
📝 Suggested improvement
 :::{note}
-Spider support is not yet available for Helm.
+Spider-based deployment types (Spider Base and Spider Full) are currently supported
+for Docker Compose only. Kubernetes/Helm support for Spider is planned for a future release.
 :::
269-274: Consider adding troubleshooting context.

The section provides debugging links but doesn't indicate what types of issues users might encounter. Adding a brief sentence about common issues (service startup failures, container crashes, connectivity problems) would help users quickly determine if they're in the right section.
📝 Optional enhancement
 ## Troubleshooting

-When issues arise, use the appropriate commands for your orchestration method:
+When services fail to start, crash, or exhibit connectivity issues, use the appropriate
+debugging commands for your orchestration method:
docs/src/user-docs/quick-start/clp-json.md (3)
53-57: Security: Credentials echoed in shell history.

The credentials generated using openssl rand -hex 16 will be stored in the shell history, potentially exposing sensitive information. Consider advising users to prefix these commands with a space (to avoid history in bash with HISTCONTROL=ignorespace) or to unset history temporarily.

78-80: Clarify the optional log input mount.

The comment states "not needed if using S3 input," but users may be unclear about when they need this mount. Consider adding a reference to the object storage guide or expanding the explanation.

127-134: Fix MD031: Add blank line after fenced code block.

The closing fence at line 134 should be followed by a blank line before the ::: tag to satisfy the MD031 linting rule.
📝 Suggested fix
 ```yaml
 database:
   port: 30306
 
 results_cache:
   port: 30017
:::
::::
</details>

</blockquote></details>
<details>
<summary>docs/src/user-docs/index.md (2)</summary><blockquote>

`72-77`: **Consider addressing the previously suggested title improvement.**

A past review suggested changing "External database setup" to a more grammatically precise alternative like "Setting up external databases" or "External database configuration" for consistency with other card titles.

---

`103-108`: **Apply the previously confirmed title change.**

The title "MCP server" should be changed to "Using the MCP server" for consistency with other cards in this section. This change was confirmed as addressed in a previous review but was not applied.




<details>
<summary>✏️ Suggested change</summary>

```diff
 :::{grid-item-card}
 :link: guides-mcp-server/index
-MCP server
+Using the MCP server
 ^^^
 Integrate CLP with AI assistants using the Model Context Protocol.
 :::

docs/src/user-docs/quick-start/clp-json.md

docs/src/user-docs/index.md

docs/src/user-docs/guides-k8s-deployment.md

docs/src/user-docs/quick-start/clp-text.md

junhaoliao

clp-text configuration issues: API server and log_ingestor are not supported in clp-text (see clp-config.template.text.yaml which sets api_server: null and log_ingestor: null). These references should be removed.

docs/src/user-docs/quick-start/clp-text.md

…t guide - Apply suggestions from code review

…guide.

junhaoliao · 2026-02-01T02:55:57Z

docs/src/user-docs/guides-k8s-deployment.md

+
+---
+
+## Monitoring and debugging


we should also add kubectl get events and kubectl describe pod <pod-name> here.

@coderabbitai create an issue to track

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Create GitHub issue to track adding kubectl debugging commands gh issue create \ --repo y-scope/clp \ --title "docs(helm): Add kubectl debugging commands to Monitoring section" \ --body "Add \`kubectl get events\` and \`kubectl describe pod <pod-name>\` commands to the Monitoring and debugging section of the Kubernetes deployment guide. Related PR: https://github.com/y-scope/clp/pull/1827 Related comment: https://github.com/y-scope/clp/pull/1827#discussion_r2750441166 Requested by: `@junhaoliao`" \ --assignee junhaoliao

Length of output: 42

@junhaoliao, I've created issue #1942 to track adding the kubectl get events and kubectl describe pod <pod-name> commands to the Monitoring and debugging section.

🧠 Learnings used

Learnt from: junhaoliao Repo: y-scope/clp PR: 1178 File: components/clp-package-utils/clp_package_utils/controller.py:217-223 Timestamp: 2025-09-25T05:13:13.298Z Learning: The compression scheduler service in CLP runs with CLP_UID_GID (current user's UID:GID) rather than CLP_SERVICE_CONTAINER_UID_GID (999:999), unlike infrastructure services such as database, queue, redis, and results cache which run with the service container UID:GID.

Enhance deployment documentation to include Kubernetes orchestration …

327ec91

…with Helm.

junhaoliao added 2 commits December 19, 2025 16:53

fix format

e9e4326

Update external database guide to clarify bundled database usage and …

164a3d8

…update orchestration links

junhaoliao mentioned this pull request Dec 22, 2025

feat(helm): Add API server deployment. #1818

Merged

3 tasks

junhaoliao added 4 commits December 26, 2025 02:55

Refactor and enhance deployment and user guides to include Docker Com…

34f7cd2

…pose and Kubernetes orchestration details.

Merge branch 'main' into helm-docs

d8ba9d9

# Conflicts: # docs/src/user-docs/index.md

Reorganize guides overview to reposition log-ingestor section

105549d

Remove duplicated API server guide reference

8c6cbce

junhaoliao mentioned this pull request Dec 26, 2025

WIP - feat(deployment): Add k8s Helm chart for CLP package deployment. #1603

Draft

16 tasks

quinntaylormitchell reviewed Jan 12, 2026

View reviewed changes

quinntaylormitchell and others added 12 commits January 12, 2026 14:16

Update clp-json.md

6e90f25

Update clp-text.md

a410618

Update guides-k8s-deployment.md

5e67440

Consolidate documentation structure for consistency and clarity, merg…

fa3629d

…ing guides, core, and reference sections into a unified overview.

Refine user documentation for clarity and conciseness in multi-host d…

7c6ce03

…eployment, core usage, resources, and reference sections.

Merge branch 'main' into helm-docs

93ad3e1

# Conflicts: # docs/src/user-docs/reference-overview.md

Add package scripts reference to user documentation

6ff459a

Refine deployment section in user docs to include details on Docker C…

4ec571b

…ompose, Kubernetes, and multi-host setups.

rename distributed to distributedDeployment to match latest propo…

e59377c

…sed code at y-scope#1829

use backticks whenever appropiate

7fcf276

add missing admin tools ref

f705016

revert admin tools ref

713c6e4

junhaoliao requested a review from quinntaylormitchell January 13, 2026 18:41

lock version to fix ci

105d96d

junhaoliao marked this pull request as ready for review January 13, 2026 19:21

junhaoliao requested a review from a team as a code owner January 13, 2026 19:21

Merge branch 'main' into helm-docs

67d8d99

coderabbitai bot reviewed Jan 13, 2026

View reviewed changes

Merge branch 'main' into helm-docs

cba43bc

kirkrodrigues added 6 commits January 18, 2026 15:42

Clean-up link.

40c251b

Some edits.

31008ea

Some edits.

5be86a6

Some edits.

cd5b237

Lint fixes.

cb525c5

Add note about API server address changes.

4d38a78

kirkrodrigues reviewed Jan 18, 2026

View reviewed changes

coderabbitai bot reviewed Jan 18, 2026

View reviewed changes

docs/src/user-docs/quick-start/clp-json.md Show resolved Hide resolved

junhaoliao commented Jan 19, 2026

View reviewed changes

docs/src/user-docs/index.md Outdated Show resolved Hide resolved

junhaoliao commented Jan 19, 2026

View reviewed changes

docs/src/user-docs/guides-k8s-deployment.md Outdated Show resolved Hide resolved

junhaoliao commented Jan 19, 2026

View reviewed changes

docs/src/user-docs/quick-start/clp-text.md Outdated Show resolved Hide resolved

junhaoliao commented Jan 19, 2026

View reviewed changes

docs/src/user-docs/quick-start/clp-text.md Outdated Show resolved Hide resolved

docs/src/user-docs/quick-start/clp-text.md Outdated Show resolved Hide resolved

docs/src/user-docs/quick-start/clp-text.md Outdated Show resolved Hide resolved

junhaoliao commented Jan 19, 2026

View reviewed changes

docs/src/user-docs/quick-start/clp-text.md Outdated Show resolved Hide resolved

junhaoliao and others added 7 commits January 18, 2026 23:01

remove references to api server / log ingestor in clp-text quick star…

03b1101

…t guide - Apply suggestions from code review

Remove API server search documentation from clp-text quick-start guide.

46d53bf

Merge branch 'main' into helm-docs

3f50484

ensure all "flavour" references are changed to "flavor"

d516bbc

add missing verb

711e8a0

Remove common configuration options table from Kubernetes deployment …

e7f454b

…guide.

fix duplicated archive_output keys; set 1-month archive retention

2a5b1ec

junhaoliao requested a review from kirkrodrigues January 19, 2026 04:15

junhaoliao added 2 commits January 18, 2026 23:21

Fix typo in user docs ("bofore" -> "before")

124ca06

Standardize spelling from "flavour" to "flavor" across documentation.

131f609

junhaoliao added this to the Backlog milestone Jan 19, 2026

kirkrodrigues approved these changes Jan 19, 2026

View reviewed changes

junhaoliao merged commit 87cb956 into y-scope:main Jan 19, 2026
20 checks passed

junhaoliao removed this from the Backlog milestone Jan 19, 2026

kirkrodrigues mentioned this pull request Jan 19, 2026

fix(docs): Replace hardcoded main-branch references with DOCS_VAR_CLP_GIT_REF. #1885

Merged

3 tasks

junhaoliao commented Feb 1, 2026

View reviewed changes

coderabbitai bot mentioned this pull request Feb 1, 2026

docs(helm): Add kubectl debugging commands to Monitoring section #1942

Open

Conversation

junhaoliao commented Dec 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

New documentation

Updated documentation

Removed documentation

Checklist

Validation performed

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Estimated code review effort

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kirkrodrigues left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

junhaoliao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 1, 2026

Choose a reason for hiding this comment

junhaoliao commented Dec 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 19, 2025 •

edited

Loading