Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc v4 fixes and improved logging #1831

Merged
merged 16 commits into from
Mar 31, 2025
Merged

Misc v4 fixes and improved logging #1831

merged 16 commits into from
Mar 31, 2025

Conversation

nicktrn
Copy link
Collaborator

@nicktrn nicktrn commented Mar 27, 2025

More logging and configurable keepalive. Also includes fixes around initial worker setup.

Summary by CodeRabbit

Summary by CodeRabbit

This release introduces enhanced configuration options and connection handling flexibility for smoother operations:

  • New Features
    • Introduced several new environment variables for improved configuration management.
    • Added a health check endpoint to the server for better monitoring.
    • Streamlined the request handling for creating worker groups by consolidating fields in the request schema.

Enjoy these new capabilities that aim to improve system responsiveness and operational transparency.

Copy link

changeset-bot bot commented Mar 27, 2025

⚠️ No Changeset found

Latest commit: 21da4e8

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Mar 27, 2025

Walkthrough

This pull request introduces several targeted changes across the codebase. In the supervisor app, multiple new environment variables are added to the Env object, including DEBUG, which enhances logging during service initialization. The ManagedSupervisor class has been modified to replace the httpServer with an optional metricsServer, and logging has been improved. In the core HTTP components, a property in the HttpReply class is renamed, while the RouteDefinition interface in the HTTP server gains an optional keepConnectionAlive property that alters request handling. Additional changes include a debug log in the Kubernetes detection function and various updates to the worker group management.

Changes

File(s) Change Summary
apps/supervisor/src/env.ts Added new environment variables: TRIGGER_WORKLOAD_API_HOST_INTERNAL, KUBERNETES_IMAGE_PULL_SECRETS, METRICS_ENABLED, METRICS_HOST, METRICS_PORT, and DEBUG in the Env object.
apps/supervisor/src/index.ts Updated ManagedSupervisor class to replace httpServer with optional metricsServer, enhanced logging for service initialization, and simplified start/stop methods using optional chaining.
packages/core/src/v3/apps/http.ts Renamed the property alreadyReplied to hasReplied in the HttpReply class and updated its references in the methods.
packages/core/src/v3/serverOnly/httpServer.ts Introduced an optional keepConnectionAlive property in the RouteDefinition interface and modified request handling logic accordingly.
packages/core/src/v3/serverOnly/k8s.ts Added a debug log statement in the isKubernetesEnvironment function to log the k8sIndicators array.
apps/supervisor/src/workloadManager/kubernetes.ts Added a private method getImagePullSecrets in KubernetesWorkloadManager to retrieve image pull secrets dynamically.
apps/supervisor/src/workloadManager/types.ts Added an optional property imagePullSecrets to the WorkloadManagerOptions interface.
apps/supervisor/src/workloadServer/index.ts Modified createHttpServer method to include a metrics configuration object and added a new health check route.
apps/webapp/app/routes/admin.api.v1.workers.ts Updated RequestBodySchema by removing projectId and makeDefault, and adding makeDefaultForProjectId.
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.restore.ts Deleted the route handler for restoring a run from a snapshot.
apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.suspend.ts Removed a comment line regarding potential debug span creation.
apps/webapp/app/v3/services/initializeDeployment.server.ts Modified deployment type handling logic to throw an error for "UNMANAGED" deployments and upgrade projects for "MANAGED" deployments.
apps/webapp/app/v3/services/worker/workerGroupService.server.ts Updated createWorkerGroup method logic to set defaultWorkerInstanceGroupId based on the existence of a global default.
packages/cli-v3/src/entryPoints/managed-run-controller.ts Added handling for the "QUEUED_EXECUTING" status in the handleSnapshotChange method.

Possibly related PRs

Suggested reviewers

  • ericallam
  • matt-aitken

Poem

I'm a rabbit hopping through lines of code,
New flags and secrets, a brightened road.
With health checks added and logs that sing,
In the world of servers, I dance and spring.
Each change a nibble, each update a treat,
In the meadow of code, my joy is complete!
🐇💻

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@nicktrn nicktrn marked this pull request as draft March 27, 2025 15:56
@nicktrn nicktrn marked this pull request as ready for review March 31, 2025 10:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
apps/supervisor/README.md (1)

46-49: Missing comma after "worker groups"

There should be a comma after "worker groups" to properly separate the introductory clause from the main clause.

-When adding more worker groups you might also want to make them the default for a specific project. This will allow you to test it without having to change the global default:
+When adding more worker groups, you might also want to make them the default for a specific project. This will allow you to test it without having to change the global default:
🧰 Tools
🪛 LanguageTool

[uncategorized] ~48-~48: A comma might be missing here.
Context: ... worker groups When adding more worker groups you might also want to make them the de...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

apps/webapp/app/v3/services/worker/workerGroupService.server.ts (1)

50-54: Consider extracting flag key to a constant

The string "defaultWorkerInstanceGroupId" is used both here and on line 59. Consider extracting this to a constant to avoid string duplication and make future changes easier.

+  private readonly DEFAULT_WORKER_GROUP_FLAG_KEY = "defaultWorkerInstanceGroupId";

   async createWorkerGroup({
     // ...existing code
-      const defaultWorkerInstanceGroupId = await getFlag({
-        key: "defaultWorkerInstanceGroupId",
-      });
+      const defaultWorkerInstanceGroupId = await getFlag({
+        key: this.DEFAULT_WORKER_GROUP_FLAG_KEY,
+      });

     // ...and then later
-          key: "defaultWorkerInstanceGroupId",
+          key: this.DEFAULT_WORKER_GROUP_FLAG_KEY,
apps/supervisor/src/index.ts (1)

350-350: Consider using consistent order in stop method

The stop method orders services differently than the start method. For consistency and to ensure proper cleanup, consider keeping the same order between start and stop methods, with workerSession being last to stop after all dependent services.

-    await this.workerSession.stop();
-
-    // Optional services
-    await this.podCleaner?.stop();
-    await this.failedPodHandler?.stop();
-    await this.metricsServer?.stop();
+    // Optional services
+    await this.podCleaner?.stop();
+    await this.failedPodHandler?.stop();
+    await this.metricsServer?.stop();
+
+    await this.workerSession.stop();
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 29ce91a and 21da4e8.

📒 Files selected for processing (12)
  • apps/supervisor/README.md (2 hunks)
  • apps/supervisor/src/env.ts (4 hunks)
  • apps/supervisor/src/index.ts (6 hunks)
  • apps/supervisor/src/workloadManager/kubernetes.ts (1 hunks)
  • apps/supervisor/src/workloadManager/types.ts (1 hunks)
  • apps/supervisor/src/workloadServer/index.ts (2 hunks)
  • apps/webapp/app/routes/admin.api.v1.workers.ts (2 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.restore.ts (0 hunks)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.suspend.ts (0 hunks)
  • apps/webapp/app/v3/services/initializeDeployment.server.ts (1 hunks)
  • apps/webapp/app/v3/services/worker/workerGroupService.server.ts (1 hunks)
  • packages/cli-v3/src/entryPoints/managed-run-controller.ts (1 hunks)
💤 Files with no reviewable changes (2)
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.suspend.ts
  • apps/webapp/app/routes/engine.v1.worker-actions.runs.$runFriendlyId.snapshots.$snapshotFriendlyId.restore.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/supervisor/src/env.ts
🧰 Additional context used
🧬 Code Definitions (2)
apps/supervisor/src/index.ts (6)
packages/core/src/v3/serverOnly/httpServer.ts (1)
  • HttpServer (64-367)
apps/supervisor/src/env.ts (1)
  • env (76-76)
apps/supervisor/src/services/podCleaner.ts (1)
  • PodCleaner (16-118)
apps/supervisor/src/services/failedPodHandler.ts (1)
  • FailedPodHandler (18-281)
apps/supervisor/src/clients/kubernetes.ts (1)
  • createK8sApi (9-29)
apps/supervisor/src/workloadServer/index.ts (1)
  • WorkloadServer (62-582)
apps/supervisor/src/workloadServer/index.ts (1)
packages/core/src/v3/serverOnly/httpServer.ts (1)
  • HttpServer (64-367)
🪛 LanguageTool
apps/supervisor/README.md

[uncategorized] ~48-~48: A comma might be missing here.
Context: ... worker groups When adding more worker groups you might also want to make them the de...

(AI_EN_LECTOR_MISSING_PUNCTUATION_COMMA)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: units / 🧪 Unit Tests
🔇 Additional comments (20)
apps/supervisor/src/workloadManager/types.ts (1)

8-8: Good addition for Kubernetes configuration flexibility

The new imagePullSecrets optional property allows specifying credentials for pulling from private container registries in Kubernetes deployments. This is a valuable enhancement for enterprise environments where private registries are common.

apps/supervisor/README.md (3)

11-12: Simplification of instruction text

Good clarity improvement by changing from plural "edit these" to singular "edit this" since there's only one value to modify.


19-19: Improved JSON payload simplification

The simplified JSON payload with only the name field makes the API easier to use. This reflects the backend changes that streamlined the worker creation process.


63-66: Good documentation for project-specific worker groups

The added section for creating additional worker groups with project-specific defaults is well-documented and aligns with the API changes. The makeDefaultForProjectId parameter provides a cleaner way to set project-specific worker groups.

apps/supervisor/src/workloadManager/kubernetes.ts (2)

185-187: Well-implemented method for image pull secrets

The new getImagePullSecrets method cleanly extracts the logic for retrieving image pull secrets, making the code more maintainable. The implementation correctly maps string names to Kubernetes V1LocalObjectReference objects.


193-193: Good refactoring to use dynamic configuration

Replacing hardcoded image pull secrets with a call to the new method improves configurability, allowing for dynamic specification of image pull secrets through options rather than hardcoding them.

apps/webapp/app/v3/services/worker/workerGroupService.server.ts (1)

50-56: Improved default worker group selection logic

The new approach of checking for an existing global default worker group is more robust than the previous count-based check. This prevents potential issues if worker groups were previously created and deleted.

packages/cli-v3/src/entryPoints/managed-run-controller.ts (1)

488-489: Enhanced execution state handling

Adding "QUEUED_EXECUTING" to the case statement ensures that runs in this state receive the same treatment as those in "EXECUTING_WITH_WAITPOINTS", which includes cleanup, debounce waiting, and suspension logic. This creates consistent handling for closely related execution states.

apps/supervisor/src/workloadServer/index.ts (2)

25-25: Metrics support integration

Adding the import for the register function enables metrics collection in the application.


125-137: Added metrics configuration and health check endpoint

The changes enhance the server's monitoring capabilities in two important ways:

  1. Metrics collection is now configured with register while keeping metrics exposure disabled
  2. A new health check endpoint provides a simple way to verify the server is operational

These improvements are important for production monitoring and reliability.

apps/webapp/app/routes/admin.api.v1.workers.ts (2)

10-10: API schema simplification

Consolidating the separate projectId and makeDefault fields into a single makeDefaultForProjectId field simplifies the API schema while maintaining the intended functionality.


37-38: Implementation of the consolidated schema field

The request handling has been properly updated to use the new consolidated field. This refactoring makes the code more maintainable by reducing the number of parameters and simplifying the conditional logic.

Also applies to: 45-49

apps/webapp/app/v3/services/initializeDeployment.server.ts (2)

21-24: Clear rejection of unsupported deployment types

The explicit check for "UNMANAGED" deployments provides clearer error messaging when rejecting unsupported deployment types.


25-35: Automatic project engine upgrade for MANAGED deployments

This improvement automatically upgrades projects from engine "V1" to "V2" when a MANAGED deployment is requested. This is an excellent enhancement that reduces friction when users deploy to V2 without having run the development setup first.

apps/supervisor/src/index.ts (6)

33-33: Clear property renaming and optionality improvement

The property has been renamed from httpServer to metricsServer with optional typing, which better reflects its purpose and accurately represents that this server may not always be initialized. This change improves code clarity and aligns with the conditional initialization later in the code.


47-51: Good security practice in debug logging

Excellent approach to debug logging by explicitly removing sensitive information (TRIGGER_WORKER_TOKEN and MANAGED_WORKER_SECRET) before logging environment variables. This prevents accidental exposure of credentials in logs.


64-92: Enhanced service initialization with improved logging

The conditional initialization of podCleaner and failedPodHandler has been significantly improved with:

  1. Clear logging of configuration parameters when enabled
  2. Explicit warning messages when disabled
  3. Consistent structure for both services

This makes the system's state more transparent during startup and helps with debugging deployment configurations.


249-258: Proper conditional metrics server initialization

The metrics server is now conditionally initialized based on the METRICS_ENABLED environment variable, which aligns with the property rename and makes the metrics collection optional. The configuration is complete with appropriate port, host and register parameters.


263-263: Improved configuration flexibility for workload server

Adding the host parameter from environment variables increases deployment flexibility, allowing the workload server to bind to specific interfaces as needed.


330-332: Cleaner service management with optional chaining

The use of optional chaining (?.) for starting and stopping services is a more elegant approach than explicit existence checks. This simplifies the code while maintaining the same behavior, properly handling cases where services might not be initialized.

Also applies to: 353-355

@matt-aitken matt-aitken merged commit 394f1de into main Mar 31, 2025
13 checks passed
@matt-aitken matt-aitken deleted the fix/v4-misc branch March 31, 2025 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants