Skip to content

feat(log-ingestor): Add OpenAPI documentation for server APIs; Add OpenAPI documentation static generator.#1783

Merged
LinZhihao-723 merged 13 commits intoy-scope:mainfrom
LinZhihao-723:log-ingestor-api-doc
Dec 16, 2025
Merged

feat(log-ingestor): Add OpenAPI documentation for server APIs; Add OpenAPI documentation static generator.#1783
LinZhihao-723 merged 13 commits intoy-scope:mainfrom
LinZhihao-723:log-ingestor-api-doc

Conversation

@LinZhihao-723
Copy link
Member

@LinZhihao-723 LinZhihao-723 commented Dec 16, 2025

Description

As the title suggested. This PR adds docs for server APIs and an OpenAPI doc generator.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

  • New Features

    • OpenAPI docs with an /openapi.json endpoint for interactive API docs and schema generation.
    • New CLI tool to generate OpenAPI JSON plus new executable and library targets for the ingestor.
    • Added CORS support for cross-origin requests.
    • Public config schemas now include validation metadata for improved input validation.
    • Added OpenAPI-related dependency to support schema generation.
  • Bug Fixes / Improvements

    • Standardized structured error responses for clearer error payloads.

✏️ Tip: You can customize this high-level summary in your review settings.

@LinZhihao-723 LinZhihao-723 requested a review from a team as a code owner December 16, 2025 02:17
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 16, 2025

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds OpenAPI support and schema metadata: new utoipa deps, ToSchema derives and field schema annotations for ingestion configs, OpenApiRouter-based routes exposing /openapi.json with CORS, structured error payloads, an OpenAPI codegen CLI binary, lib/bin Cargo entries, and minor router error logging.

Changes

Cohort / File(s) Summary
Dependency: clp-rust-utils
components/clp-rust-utils/Cargo.toml
Added dependency utoipa = { version = "5.4.0" }.
Config schema annotations
components/clp-rust-utils/src/job_config/ingestion.rs
Added ToSchema derive to BaseConfig, SqsListenerConfig, S3ScannerConfig; annotated fields (bucket_name, key_prefix, dataset, timestamp_key, tags, queue_url, start_after) with #[schema(...)] metadata (e.g., value_type, min_length = 1).
Dependencies & targets: log-ingestor
components/log-ingestor/Cargo.toml
Added utoipa (with axum_extras), utoipa-axum, and tower-http (cors feature); removed log dependency; added [lib] (name = "log_ingestor") and two [[bin]] entries for log-ingestor and log-ingestor-openapi-codegen.
Router & OpenAPI integration
components/log-ingestor/src/routes.rs
Reworked router to use OpenApiRouter and expose /openapi.json; added ApiDoc export and #[utoipa::path] annotations for endpoints; introduced ErrorResponse and made CreationResponse ToSchema; changed create_router() to return Result<Router<...>, serde_json::Error>; applied CORS layer; updated endpoints to return Result types and standardized responses.
Binaries & error logging
components/log-ingestor/src/bin/log_ingestor.rs, components/log-ingestor/src/bin/log_ingestor_openapi_codegen.rs
Added inspect_err logging when router creation fails in log_ingestor.rs. Added new binary log_ingestor_openapi_codegen.rs that writes generated OpenAPI JSON (via log_ingestor::routes::ApiDoc::openapi()) to a CLI-specified file.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Client
    participant Server as Log-Ingestor Server
    participant Router as OpenApiRouter
    participant Manager as IngestionJobManager
    participant ApiDoc as ApiDoc/OpenAPI

    Client->>Server: POST /s3_scanner or /sqs_listener
    Server->>Router: route request
    Router->>Manager: invoke create job handler
    Manager-->>Router: CreationResponse or Error
    Router-->>Client: 200 Json(CreationResponse) or ErrorResponse
    note right of Router `#D3E4FF`: OpenAPI metadata available at /openapi.json
Loading
sequenceDiagram
    autonumber
    actor DevCLI
    participant Codegen as log_ingestor_openapi_codegen
    participant ApiDoc as ApiDoc/OpenAPI
    participant FS as Filesystem

    DevCLI->>Codegen: run with output path
    Codegen->>ApiDoc: call ApiDoc::openapi()
    ApiDoc-->>Codegen: OpenAPI JSON
    Codegen->>FS: create/truncate and write file
    FS-->>Codegen: write success
    Codegen-->>DevCLI: exit success
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review components/log-ingestor/src/routes.rs for OpenAPI annotations, create_router() signature change, OpenApiRouter wiring, CORS layer, and IntoResponse/error payload changes.
  • Verify schema annotations and ToSchema derives in components/clp-rust-utils/src/job_config/ingestion.rs.
  • Confirm components/log-ingestor/Cargo.toml lib/bin entries and new dependencies; check the new codegen binary compiles and uses ApiDoc::openapi() correctly.

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding OpenAPI documentation support and a static generator for the log-ingestor service.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/clp-rust-utils/src/job_config/ingestion.rs (1)

7-38: Fix the min_length constraint on the tags field annotation at line 36.

The tags field uses #[schema(value_type = Vec<String>, min_length = 1)], but min_length is a string-level constraint in OpenAPI/JSON Schema, not an array constraint. To enforce a minimum number of elements in the array, use min_items = 1 instead. If the intent is to document that each string must be non-empty, note that the Rust type Option<Vec<NonEmptyString>> already enforces this at compile-time; you may consider whether explicit schema annotation is needed here or if a custom item schema is preferable.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbc3b71 and 553f1fd.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • components/clp-rust-utils/Cargo.toml (1 hunks)
  • components/clp-rust-utils/src/job_config/ingestion.rs (3 hunks)
  • components/log-ingestor/Cargo.toml (2 hunks)
  • components/log-ingestor/src/bin/log_ingestor.rs (1 hunks)
  • components/log-ingestor/src/bin/log_ingestor_openapi_codegen.rs (1 hunks)
  • components/log-ingestor/src/routes.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
📚 Learning: 2025-10-22T21:02:31.113Z
Learnt from: junhaoliao
Repo: y-scope/clp PR: 0
File: :0-0
Timestamp: 2025-10-22T21:02:31.113Z
Learning: Repository y-scope/clp: Maintain deterministic CI/builds for Rust; add a check to verify Cargo.lock is in sync with Cargo.toml without updating dependencies (non-mutating verification in clp-rust-checks workflow).

Applied to files:

  • components/clp-rust-utils/Cargo.toml
🧬 Code graph analysis (2)
components/log-ingestor/src/bin/log_ingestor.rs (1)
components/log-ingestor/src/routes.rs (1)
  • create_router (59-76)
components/log-ingestor/src/routes.rs (4)
components/log-ingestor/src/ingestion_job_manager.rs (2)
  • create_s3_scanner_job (106-118)
  • create_sqs_listener_job (131-149)
components/clp-rust-utils/src/clp_config/package/config.rs (2)
  • serde_json (315-315)
  • serde_json (339-339)
components/log-ingestor/src/ingestion_job/sqs_listener.rs (1)
  • serde_json (94-94)
components/log-ingestor/src/compression/compression_job_submitter.rs (1)
  • new (58-102)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: package-image
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks
🔇 Additional comments (13)
components/log-ingestor/src/bin/log_ingestor.rs (1)

98-102: LGTM!

The error handling for create_router() follows the same pattern used for IngestionJobManagerState::from_config() above (lines 92-96), maintaining consistency. Using inspect_err for logging while propagating the error is idiomatic.

components/log-ingestor/Cargo.toml (2)

6-16: LGTM!

The library and binary target structure is well-organized. Using underscores for the lib name (log_ingestor) and hyphens for binary names (log-ingestor, log-ingestor-openapi-codegen) follows Rust conventions.


37-41: Dependency versions are compatible.

utoipa-axum 0.2.0 is compatible with utoipa 5.4.0 and axum 0.8.6. The dependency versions align correctly.

components/log-ingestor/src/routes.rs (5)

74-74: Permissive CORS configuration.

allow_origin(Any) permits requests from any origin. This is appropriate for a public API or when accessed via Swagger UI from different origins. If this service should only be accessed from specific origins in production, consider restricting CORS origins via configuration.


18-46: Well-documented workaround for utoipa quirks.

The module structure with explicit imports to avoid "super" appearing as a tag in the OpenAPI documentation is a good workaround. The #[allow(clippy::needless_for_each)] is appropriately scoped to the module.


111-121: LGTM!

The response types CreationResponse and ErrorResponse are correctly annotated with ToSchema for OpenAPI schema generation.


132-172: LGTM!

The create_s3_scanner_job endpoint is well-documented with comprehensive OpenAPI annotations covering all response scenarios (OK, CONFLICT, INTERNAL_SERVER_ERROR).


174-214: LGTM!

The create_sqs_listener_job endpoint documentation mirrors the S3 scanner endpoint appropriately, with clear descriptions of the SQS-specific behaviour.

components/log-ingestor/src/bin/log_ingestor_openapi_codegen.rs (1)

13-17: LGTM!

The codegen binary is simple and focused. It correctly retrieves the OpenAPI document from ApiDoc::openapi() and serializes it to the specified file path.

components/clp-rust-utils/Cargo.toml (1)

19-19: LGTM!

The utoipa version (5.4.0) is consistent with the version used in components/log-ingestor/Cargo.toml, which is good practice for avoiding version conflicts within the workspace. Based on learnings, ensure Cargo.lock is updated and in sync with these changes.

components/clp-rust-utils/src/job_config/ingestion.rs (3)

4-4: LGTM!

The utoipa::ToSchema import is correctly added to support OpenAPI schema generation for the configuration structs.


41-50: LGTM!

The ToSchema derive and schema annotations are correctly applied. The queue_url field's schema annotation properly represents the NonEmptyString constraint, and utoipa should correctly handle the flattened base field by including BaseConfig's fields in the generated schema.


53-67: LGTM!

The ToSchema derive and schema annotations are correctly applied. The start_after field's schema annotation properly represents the NonEmptyString constraint. The scanning_interval_sec field doesn't require an explicit schema annotation as utoipa can correctly infer the schema for primitive types like u32.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 553f1fd and 067ef10.

📒 Files selected for processing (1)
  • components/log-ingestor/src/routes.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1535
File: components/clp-rust-utils/src/clp_config/package/config.rs:47-61
Timestamp: 2025-11-03T16:17:40.223Z
Learning: In the y-scope/clp repository, the `ApiServer` struct in `components/clp-rust-utils/src/clp_config/package/config.rs` is a Rust-native configuration type and does not mirror any Python code, unlike other structs in the same file (Config, Database, ResultsCache, Package) which are mirrors of Python definitions.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
🧬 Code graph analysis (1)
components/log-ingestor/src/routes.rs (1)
components/log-ingestor/src/ingestion_job_manager.rs (2)
  • create_s3_scanner_job (106-118)
  • create_sqs_listener_job (131-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: rust-checks
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
🔇 Additional comments (4)
components/log-ingestor/src/routes.rs (4)

18-46: LGTM! Well-structured OpenAPI documentation module.

The api_doc module is properly configured with appropriate metadata and path imports. The clippy suppression is justified and well-commented.


78-109: LGTM! Proper error handling with structured responses.

The error type and IntoResponse implementation correctly map internal errors to appropriate HTTP status codes and structured error responses.


111-121: LGTM! Response types properly configured for OpenAPI.

Both CreationResponse and ErrorResponse correctly derive the necessary traits for serialization and schema generation.


123-130: LGTM! Health endpoint properly documented.

The OpenAPI annotation correctly describes the health check endpoint.

fn main() -> Result<()> {
let mut file = std::fs::File::create(Args::parse().path)?;
let api = log_ingestor::routes::ApiDoc::openapi();
write!(file, "{}", api.to_json()?)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, where do we plan to write this file to? instead of a temporary directory, we may want to write it to a source directory, so that we can generate typing info for any clients who use the HTTP endpoints with tools like: https://openapi-ts.dev/

we can check if the file is up-to-date in a CI workflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This part of code is mirroring the API server. The generated API doc will be stored under docs/_static and will be packaged into the doc site.
  • I'm planning to enable pre-generation after this PR is merged, together with the API server.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
components/log-ingestor/src/routes.rs (4)

61-62: Duplicate route registration for health endpoint.

The health endpoint is registered twice: once manually at / (line 61) and once via the routes! macro at /health (line 62). This appears intentional to support both paths, but consider adding a comment to clarify this design decision.


74-74: Permissive CORS policy may need justification for this internal service.

The allow_origin(Any) setting permits requests from any origin. While this service is internal infrastructure bound to 127.0.0.1:3002 by default (with network isolation), consider whether this broad CORS permission is necessary. If external callers (e.g., browser-based clients) need access, document this decision or restrict origins to known domains in production.


132-155: Document the request body in the OpenAPI annotation.

The #[utoipa::path] annotation should include a request_body attribute to document the expected S3ScannerConfig input format for API consumers.

Apply this diff to add the request body documentation:

 #[utoipa::path(
     post,
     path = "/s3_scanner",
+    request_body = S3ScannerConfig,
     description = "Creates an ingestion job that periodically scans the specified S3 bucket and \

174-197: Document the request body in the OpenAPI annotation.

Similar to create_s3_scanner_job, this endpoint should document its SqsListenerConfig request body in the OpenAPI annotation.

Apply this diff:

 #[utoipa::path(
     post,
     path = "/sqs_listener",
+    request_body = SqsListenerConfig,
     description = "Creates an ingestion job that listens to an SQS queue for notifications about \
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 067ef10 and 2744507.

📒 Files selected for processing (1)
  • components/log-ingestor/src/routes.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: build (macos-15)
  • GitHub Check: rust-checks
🔇 Additional comments (3)
components/log-ingestor/src/routes.rs (3)

111-121: Well-structured response types for OpenAPI.

The response types are properly annotated with ToSchema for OpenAPI schema generation, and the ErrorResponse struct provides consistent error handling across all endpoints.


123-130: Health endpoint properly documented.

The OpenAPI annotation is complete and the implementation is straightforward.


216-246: OpenAPI annotation is now complete.

The path parameter documentation has been properly added, including the params attribute with description. The path placeholder {job_id} correctly matches the handler signature.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
components/log-ingestor/src/routes.rs (4)

65-66: Duplicate route registration for health endpoint.

The health endpoint is registered at both / (line 65) and /health (via routes! macro on line 66). This was previously flagged - consider adding a clarifying comment if intentional.


78-78: Permissive CORS policy previously flagged.

The allow_origin(Any) setting was previously noted. If this is intentional for the OpenAPI endpoint to be accessible from Swagger UI at any origin, consider documenting this decision.


137-161: Request body documentation still missing.

The handler accepts Json<S3ScannerConfig> but the OpenAPI annotation lacks request_body = S3ScannerConfig. This was previously flagged.

 #[utoipa::path(
     post,
     path = "/s3_scanner",
     tags = ["IngestionJob"],
+    request_body = S3ScannerConfig,
     description = "Creates an ingestion job that periodically scans the specified S3 bucket and \

180-204: Request body documentation still missing.

The handler accepts Json<SqsListenerConfig> but the OpenAPI annotation lacks request_body = SqsListenerConfig. This was previously flagged.

 #[utoipa::path(
     post,
     path = "/sqs_listener",
     tags = ["IngestionJob"],
+    request_body = SqsListenerConfig,
     description = "Creates an ingestion job that listens to an SQS queue for notifications about \
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2744507 and 509abc9.

📒 Files selected for processing (1)
  • components/log-ingestor/src/routes.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
🧬 Code graph analysis (1)
components/log-ingestor/src/routes.rs (1)
components/log-ingestor/src/ingestion_job_manager.rs (2)
  • create_s3_scanner_job (106-118)
  • create_sqs_listener_job (131-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: package-image
  • GitHub Check: rust-checks
  • GitHub Check: build (ubuntu-24.04)
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (7)
components/log-ingestor/src/routes.rs (7)

8-14: LGTM!

The imports are well-organized and appropriate for the OpenAPI integration with utoipa and CORS support via tower-http.


18-50: Well-structured OpenAPI module with proper tag organization.

The module correctly addresses the previous suggestion by adding tags for endpoint categorization. The import pattern to avoid super appearing as a tag is well-documented.


91-113: LGTM!

The error handling correctly maps internal errors to appropriate HTTP status codes and uses the new ErrorResponse struct for consistent JSON error payloads that align with the OpenAPI schema.


115-125: LGTM!

The response structs are properly annotated with ToSchema for OpenAPI schema generation, and the doc comments provide clear field descriptions that will appear in the generated documentation.


127-135: LGTM!

The health endpoint OpenAPI annotation is correct with appropriate path, tags, and response documentation.


223-254: LGTM!

The OpenAPI annotation correctly documents the path parameter {job_id} with the params attribute, addressing previous review feedback. The responses comprehensively cover success and error cases.


255-273: LGTM!

The handler implementation correctly validates the UUID format with appropriate error handling and logging at each stage.

Copy link
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments

Comment on lines +18 to +28
// `utoipa::OpenApi` triggers `clippy::needless_for_each`
#[allow(clippy::needless_for_each)]
mod api_doc {
// Using `super::...` can cause `super` to appear as a tag in the generated OpenAPI
// documentation. Importing the paths directly prevents this issue.
use super::{
__path_create_s3_scanner_job,
__path_create_sqs_listener_job,
__path_health,
__path_stop_and_delete_job,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this code again, mod api_doc is needed only because #[allow(clippy::needless_for_each)] cannot be applied to a dervie/attr macro. And to add a new mod, we imports use super::{ __path_create_s3_scanner_job, __path_create_sqs_listener_job, __path_health, __path_stop_and_delete_job, }; manually. Do you think we should simply disable clippy::needless_for_each for this file to avoid hassle?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cargo.lock has changed a lot. Did you remove Cargo.lock and run task rust?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to fix a merge conflict. Is it a problem?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is likely safest to checkout the Cargo.lock file from the main branch and then rerun task rust when there is a conflict. The Cargo.lock file pins the working versions of all dependencies. After updating the lock file, we probably should verify whether the dependencies have broken. (Though there won't be an issue most of the time)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense.

Comment on lines +184 to +188
description = "Creates an ingestion job that listens to an SQS queue for notifications about \
new objects to ingest from the specified S3 bucket and key prefix.\n\n\
The specified SQS queue must be dedicated to this ingestion job. After successfully \
processing a notification, the ingestor deletes the corresponding message from the queue \
to prevent duplicate ingestion.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first sentence is too long. How about

Creates an ingestion job that monitors an SQS queue. The queue receives notifications whenever new objects are added to the specified S3 bucket and key prefix.

The specified SQS queue must be dedicated to this ingestion job. Upon successful ingestion, the job deletes the corresponding message from the queue to ensure objects are not ingested multiple times.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add another paragraph to explain we may also delete any irrelevant message.

Comment on lines +143 to +145
The scanner assumes that objects under the given prefix are immutable and are added in \
lexicographical order. Based on this assumption, the scanner ingests objects sequentially, \
ensuring that no eligible objects are skipped.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this

Suggested change
The scanner assumes that objects under the given prefix are immutable and are added in \
lexicographical order. Based on this assumption, the scanner ingests objects sequentially, \
ensuring that no eligible objects are skipped.",
This scanner ingests objects sequentially, relying on the assumption that all objects under the prefix are immutable and are added in lexicographical order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote this section with more details for clarity.

let log_ingestor_router = create_router().with_state(log_ingestor_manager_state);
let log_ingestor_router = create_router()
.inspect_err(|err| {
tracing::error!(err = ? err, "Failed to create router.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to delete this after we implement proper logging by writing to stderr. Otherwise, the error will be printed twice by anyhow and tracing. It's good for now

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
components/log-ingestor/src/routes.rs (3)

50-67: Router construction is functional but consider past feedback.

The router is properly constructed with OpenAPI integration. Note that previous review comments have flagged:

  • The permissive CORS allow_origin(Any) policy (line 65)
  • The potential optimization of using a static reference for api_json (lines 59-64)
  • The intentional duplicate health route registration at both "/" and "/health" (lines 52-53)

124-153: Document the request body in the OpenAPI annotation.

The #[utoipa::path] annotation is missing the request_body attribute to document the expected S3ScannerConfig input. This was flagged in a previous review and remains unaddressed.

Apply this diff to add request body documentation:

 #[utoipa::path(
     post,
     path = "/s3_scanner",
     tags = ["IngestionJob"],
+    request_body = S3ScannerConfig,
     description = "Creates an ingestion job that periodically scans the specified S3 bucket and \

Based on past review comments, this is essential for complete API documentation.


172-199: Document the request body in the OpenAPI annotation.

Similar to the S3 scanner endpoint, the #[utoipa::path] annotation is missing the request_body attribute for SqsListenerConfig. This was previously flagged and remains unaddressed.

Apply this diff:

 #[utoipa::path(
     post,
     path = "/sqs_listener",
     tags = ["IngestionJob"],
+    request_body = SqsListenerConfig,
     description = "Creates an ingestion job that monitors an SQS queue. The queue receives \

Based on past review comments, this documentation is necessary for API consumers.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 509abc9 and abce9f4.

📒 Files selected for processing (1)
  • components/log-ingestor/src/routes.rs (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-08T15:52:50.753Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 549
File: components/core/tests/test-ir_encoding_methods.cpp:1180-1186
Timestamp: 2024-10-01T07:59:11.208Z
Learning: In the context of loop constructs, LinZhihao-723 prefers using `while (true)` loops and does not consider alternative loop constructs necessarily more readable.
🧬 Code graph analysis (1)
components/log-ingestor/src/routes.rs (1)
components/log-ingestor/src/ingestion_job_manager.rs (2)
  • create_s3_scanner_job (106-118)
  • create_sqs_listener_job (131-149)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: package-image
  • GitHub Check: lint-check (macos-15)
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: rust-checks
  • GitHub Check: build (macos-15)
  • GitHub Check: build (ubuntu-24.04)
🔇 Additional comments (6)
components/log-ingestor/src/routes.rs (6)

2-16: LGTM!

The imports are well-organized and all necessary dependencies for OpenAPI integration are properly included.


19-37: LGTM!

The OpenAPI configuration is well-structured with appropriate metadata, tags for endpoint organization, and all paths properly registered.


69-100: LGTM!

The error handling implementation is well-structured with appropriate HTTP status codes for each error variant and consistent use of the ErrorResponse payload.


102-112: LGTM!

The response structures are properly defined with appropriate derives (ToSchema for OpenAPI, Serialize for JSON) and clear field documentation.


114-122: LGTM!

The health endpoint is properly documented with OpenAPI annotations and implements a simple, effective health check.


218-268: LGTM!

The delete job endpoint is well-documented with proper OpenAPI annotations including path parameter documentation, comprehensive response codes, and appropriate UUID validation in the handler.

Copy link
Contributor

@hoophalab hoophalab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Validations: openapi.json exports correctly and looks correct.

@LinZhihao-723 LinZhihao-723 merged commit 369c671 into y-scope:main Dec 16, 2025
20 checks passed
davidlion pushed a commit to davidlion/clp that referenced this pull request Jan 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants