feat(clp-package): Add support for ingesting from custom S3 endpoints using log-ingestor; Add support for streaming search results from custom S3 endpoints using the API server.#1776
Conversation
WalkthroughS3 region changed from a required String to an optional NonEmptyString and a new optional endpoint_url (Option) was added; these types and the new AWS_DEFAULT_REGION constant are propagated through clients, ingestion job management, API usage, tests, and manifests. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/log-ingestor/src/aws_client_manager.rs (1)
51-56: Consider adding endpoint support toSqsClientWrapper::createfor consistency.
S3ClientWrapper::createnow accepts anendpoint_urlparameter, butSqsClientWrapper::createstill hardcodesNone. This asymmetry means SQS listener jobs created viacreate_sqs_listener_jobcannot use custom endpoints (e.g., LocalStack), even though the underlyingsqs::create_new_clientsupports it. The test on Line 165 works around this by calling the low-level function directly.- pub async fn create(region: &str, access_key_id: &str, secret_access_key: &str) -> Self { + pub async fn create(region: &str, access_key_id: &str, secret_access_key: &str, endpoint_url: Option<&str>) -> Self { let sqs_client = - clp_rust_utils::sqs::create_new_client(region, access_key_id, secret_access_key, None) + clp_rust_utils::sqs::create_new_client(region, access_key_id, secret_access_key, endpoint_url) .await; Self::from(sqs_client) }This would also require updating
create_sqs_listener_jobiningestion_job_manager.rsto passconfig.base.endpoint_url.as_deref().
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/clp-rust-utils/src/job_config/ingestion.rs(1 hunks)components/clp-rust-utils/src/s3/client.rs(1 hunks)components/log-ingestor/src/aws_client_manager.rs(1 hunks)components/log-ingestor/src/ingestion_job_manager.rs(1 hunks)components/log-ingestor/tests/test_ingestion_job.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
components/clp-rust-utils/src/s3/client.rscomponents/clp-rust-utils/src/job_config/ingestion.rscomponents/log-ingestor/src/ingestion_job_manager.rscomponents/log-ingestor/src/aws_client_manager.rscomponents/log-ingestor/tests/test_ingestion_job.rs
🧬 Code graph analysis (1)
components/log-ingestor/src/aws_client_manager.rs (2)
components/clp-rust-utils/src/s3/client.rs (1)
create_new_client(18-43)components/clp-rust-utils/src/sqs/client.rs (1)
create_new_client(14-34)
🔇 Additional comments (6)
components/clp-rust-utils/src/s3/client.rs (1)
37-40: LGTM!The conditional handling of the optional endpoint is clean and correctly integrates with the existing builder pattern.
components/log-ingestor/tests/test_ingestion_job.rs (2)
183-184: LGTM!The
endpoint_urlfield is correctly populated for testing custom S3 endpoints.
259-260: LGTM!The
endpoint_urlis correctly set for the S3 scanner test configuration.components/log-ingestor/src/aws_client_manager.rs (1)
77-82: LGTM on S3 endpoint support.The
endpoint_urlparameter is correctly added and forwarded to the underlying client creation.components/log-ingestor/src/ingestion_job_manager.rs (2)
106-119: LGTM on S3 scanner endpoint propagation.The
endpoint_urlis correctly extracted from the configuration and passed to the S3 client wrapper.
132-150: SQS listener doesn't use custom endpoint.If the
SqsClientWrapper::createis updated to supportendpoint_urlas suggested inaws_client_manager.rs, this function should also passconfig.base.endpoint_url.as_deref()for consistency with S3 scanner jobs.
4b372db to
a5a03f9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/log-ingestor/src/ingestion_job_manager.rs (1)
107-119: Passendpoint_urltoSqsClientWrapper::createfor consistency with S3 and support for SQS-compatible endpoints.The S3 scanner job passes
endpoint_urlto the client wrapper (line 111), butcreate_sqs_listener_job(lines 134–137) does not passconfig.base.endpoint_urltoSqsClientWrapper::create. SinceSqsListenerConfiginheritsendpoint_urlviaBaseConfigand the underlyingclp_rust_utils::sqs::create_new_clientaccepts an optional endpoint parameter,SqsClientWrapper::createshould also accept and pass through the endpoint URL. This ensures consistency with S3-compatible stores that provide both S3 and SQS APIs (e.g., LocalStack).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/clp-rust-utils/src/job_config/ingestion.rs(1 hunks)components/clp-rust-utils/src/s3/client.rs(1 hunks)components/log-ingestor/src/aws_client_manager.rs(1 hunks)components/log-ingestor/src/ingestion_job_manager.rs(1 hunks)components/log-ingestor/tests/test_ingestion_job.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
components/clp-rust-utils/src/job_config/ingestion.rscomponents/clp-rust-utils/src/s3/client.rscomponents/log-ingestor/tests/test_ingestion_job.rscomponents/log-ingestor/src/aws_client_manager.rs
🧬 Code graph analysis (1)
components/log-ingestor/src/aws_client_manager.rs (2)
components/clp-rust-utils/src/s3/client.rs (1)
create_new_client(18-43)components/clp-rust-utils/src/sqs/client.rs (1)
create_new_client(14-34)
🔇 Additional comments (5)
components/log-ingestor/tests/test_ingestion_job.rs (2)
176-186: LGTM!The endpoint configuration is correctly populated in the
BaseConfigfor the SQS listener test. The.clone()is necessary sinceaws_config.endpointis used again later in the test.
251-265: LGTM!The endpoint configuration is correctly populated in the
BaseConfigfor the S3 scanner test. The move (without.clone()) is appropriate here since this is the final use ofaws_config.endpoint.components/clp-rust-utils/src/s3/client.rs (1)
37-40: LGTM!The conditional endpoint handling is correctly implemented. The use of
force_path_style(true)(Line 36) is appropriate for S3-compatible endpoints like MinIO and LocalStack.components/log-ingestor/src/aws_client_manager.rs (1)
77-82: LGTM!The
S3ClientWrapper::createmethod correctly accepts and forwards the optionalendpoint_urlparameter to the underlying client creation function. This public API change is appropriately propagated throughout the codebase.components/clp-rust-utils/src/job_config/ingestion.rs (1)
17-18: Theendpoint_urlfield correctly supports custom S3-compatible stores. However, theregionfield remains required (String, not Option) even when using custom endpoints. In the AWS SDK for Rust, when a custom endpoint is used, the Region is only used for signing and is not used to route the request. This is the expected behaviour: region is needed for request signing, while the optionalendpoint_urlparameter determines where the request is actually routed. The current implementation correctly reflects this AWS SDK pattern.
| fn normalize_endpoint(endpoint: &str) -> String { | ||
| match endpoint { | ||
| ep if ep.trim().starts_with("http://") || ep.trim().starts_with("https://") => ep.into(), | ||
| ep => format!("https://{ep}"), | ||
| } | ||
| } |
There was a problem hiding this comment.
Fix case-sensitivity and trimming inconsistencies in scheme detection.
Two issues in the endpoint normalization:
- Case-sensitivity: The scheme check won't match uppercase variants like
"HTTP://"or"HTTPS://", causing them to receive an additional"https://"prefix. - Trim inconsistency: The function uses
trim()in the check but returns the original untrimmedep. An endpoint like" http://example.com "would pass the check but be returned with leading/trailing whitespace intact.
Apply this diff to fix both issues:
fn normalize_endpoint(endpoint: &str) -> String {
- match endpoint {
- ep if ep.trim().starts_with("http://") || ep.trim().starts_with("https://") => ep.into(),
- ep => format!("https://{ep}"),
- }
+ let trimmed = endpoint.trim();
+ let lower = trimmed.to_ascii_lowercase();
+ if lower.starts_with("http://") || lower.starts_with("https://") {
+ trimmed.to_string()
+ } else {
+ format!("https://{trimmed}")
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fn normalize_endpoint(endpoint: &str) -> String { | |
| match endpoint { | |
| ep if ep.trim().starts_with("http://") || ep.trim().starts_with("https://") => ep.into(), | |
| ep => format!("https://{ep}"), | |
| } | |
| } | |
| fn normalize_endpoint(endpoint: &str) -> String { | |
| let trimmed = endpoint.trim(); | |
| let lower = trimmed.to_ascii_lowercase(); | |
| if lower.starts_with("http://") || lower.starts_with("https://") { | |
| trimmed.to_string() | |
| } else { | |
| format!("https://{trimmed}") | |
| } | |
| } |
🤖 Prompt for AI Agents
In components/clp-rust-utils/src/s3/client.rs around lines 45 to 50, the
endpoint normalization is case-sensitive and returns the untrimmed input; fix by
first trimming the input into a local variable, perform the scheme check against
the lowercased trimmed value (e.g., starts_with "http://" or "https://"), and
return the trimmed endpoint (either as-is when it already contains a scheme or
prefixed with "https://" when it does not) so uppercase schemes and surrounding
whitespace are handled correctly.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Hi @sudheergajula, thanks for your contribution!
As #1767 has been merged, I realize we need to modify/update a few more places in the code base to make log-ingestor fully support custom S3 endpoints. Would you mind if I directly push the rest of modifications into this PR so we can make it in the end-of-year release?
| fn normalize_endpoint(endpoint: &str) -> String { | ||
| match endpoint { | ||
| ep if ep.trim().starts_with("http://") || ep.trim().starts_with("https://") => ep.into(), | ||
| ep => format!("https://{ep}"), | ||
| } | ||
| } |
There was a problem hiding this comment.
This method might not be necessary. A URL without http|https isn't considered an endpoint URL by AWS-CLI. I think we should leave this to users to provide a valid scheme.
There was a problem hiding this comment.
Agree, this was added to handle the http:// prefix that generate_s3_virtual_hosted_style_url, i think now it is addressed by generate_s3_url
…ages. Summary: This PR includes fixes for following: 1. ingestion.rs to have place holder for endpoint. 2. client.rs to create s3 client based on endpoint and normalise endpoint-url. Test Plan - Tested log ingestion with custom s3 endpoints
80073c4 to
674db3b
Compare
sudheergajula
left a comment
There was a problem hiding this comment.
Addressed review comments
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
components/log-ingestor/src/ingestion_job_manager.rs (1)
132-139: Critical: SQS client creation is missing endpoint_url parameter.The
create_sqs_listener_jobmethod creates an SQS client without passingconfig.base.endpoint_url, even though:
- The config contains
endpoint_urlinconfig.base- The underlying
clp_rust_utils::sqs::create_new_clientsupports an endpoint parameter- The S3 client creation (line 111) correctly passes the endpoint_url
This inconsistency will prevent SQS listeners from working with custom S3-compatible endpoints like MinIO or LocalStack.
Apply this diff to pass the endpoint_url to the SQS client:
let sqs_client_manager = SqsClientWrapper::create( config.base.region.as_str(), self.inner.aws_credentials.access_key_id.as_str(), self.inner.aws_credentials.secret_access_key.as_str(), + config.base.endpoint_url.as_ref(), ) .await;Note: This also requires updating
SqsClientWrapper::createinaws_client_manager.rsto accept the endpoint_url parameter.components/log-ingestor/src/aws_client_manager.rs (1)
52-57: Critical: SqsClientWrapper::create is missing endpoint_url parameter.The
SqsClientWrapper::createmethod doesn't accept anendpoint_urlparameter, even though:
- The underlying
clp_rust_utils::sqs::create_new_clientsupports it (line 54 shows it's called withNone)- The parallel
S3ClientWrapper::createcorrectly accepts and forwardsendpoint_url(line 78)This prevents SQS clients from connecting to custom endpoints like MinIO or LocalStack.
Apply this diff to add endpoint_url support:
- pub async fn create(region: &str, access_key_id: &str, secret_access_key: &str) -> Self { + pub async fn create( + region: &str, + access_key_id: &str, + secret_access_key: &str, + endpoint_url: Option<&NonEmptyString>, + ) -> Self { let sqs_client = - clp_rust_utils::sqs::create_new_client(region, access_key_id, secret_access_key, None) + clp_rust_utils::sqs::create_new_client(region, access_key_id, secret_access_key, endpoint_url) .await; Self::from(sqs_client) }
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (7)
components/clp-rust-utils/src/job_config/ingestion.rs(1 hunks)components/clp-rust-utils/src/s3/client.rs(3 hunks)components/clp-rust-utils/src/sqs/client.rs(3 hunks)components/log-ingestor/src/aws_client_manager.rs(2 hunks)components/log-ingestor/src/ingestion_job_manager.rs(1 hunks)components/log-ingestor/tests/aws_config.rs(2 hunks)components/log-ingestor/tests/test_ingestion_job.rs(6 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
components/log-ingestor/src/aws_client_manager.rscomponents/clp-rust-utils/src/job_config/ingestion.rscomponents/clp-rust-utils/src/sqs/client.rscomponents/clp-rust-utils/src/s3/client.rscomponents/log-ingestor/tests/test_ingestion_job.rs
📚 Learning: 2025-09-17T22:51:15.765Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1304
File: components/webui/client/src/sql-parser/index.ts:98-115
Timestamp: 2025-09-17T22:51:15.765Z
Learning: In the CLP webui codebase, when designing APIs that accept optional string parameters, developers should omit the value (pass undefined) rather than pass empty strings. The API contract should be clear that empty strings are not valid inputs - only undefined should be used to indicate omission of optional clauses.
Applied to files:
components/clp-rust-utils/src/job_config/ingestion.rs
📚 Learning: 2025-01-17T23:25:38.165Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 673
File: components/log-viewer-webui/server/src/S3Manager.js:0-0
Timestamp: 2025-01-17T23:25:38.165Z
Learning: In S3Manager.js, URL validation is handled by the URL constructor which throws TypeError for invalid URLs, making additional scheme (s3://) validation redundant.
Applied to files:
components/clp-rust-utils/src/s3/client.rs
📚 Learning: 2025-04-25T20:46:20.140Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
Applied to files:
components/clp-rust-utils/src/s3/client.rs
🧬 Code graph analysis (2)
components/log-ingestor/src/aws_client_manager.rs (2)
components/clp-rust-utils/src/s3/client.rs (1)
create_new_client(19-43)components/clp-rust-utils/src/sqs/client.rs (1)
create_new_client(15-37)
components/log-ingestor/tests/test_ingestion_job.rs (1)
components/clp-rust-utils/src/types.rs (1)
from_string(26-28)
🔇 Additional comments (5)
components/clp-rust-utils/src/sqs/client.rs (1)
6-6: LGTM! Consistent endpoint type handling.The changes properly update the endpoint parameter to use
NonEmptyStringand apply conditional endpoint configuration, matching the pattern used in the S3 client.Also applies to: 19-19, 33-35
components/clp-rust-utils/src/s3/client.rs (1)
6-6: LGTM! Consistent endpoint type handling.The endpoint parameter type change to
NonEmptyStringand conditional configuration logic mirror the SQS client implementation, providing consistent handling across AWS clients.Also applies to: 23-23, 38-40
components/log-ingestor/tests/aws_config.rs (1)
13-13: LGTM! Proper validation for test configuration.The endpoint field is correctly changed to
Option<NonEmptyString>with appropriate error handling for empty values.Also applies to: 67-68
components/clp-rust-utils/src/job_config/ingestion.rs (1)
20-24: LGTM! Well-documented and properly configured field.The
endpoint_urlfield is correctly added with:
- Clear documentation explaining its purpose for custom S3-compatible stores
- Appropriate
serde(default)for optional deserialization- Consistent schema annotation matching other optional string fields
components/log-ingestor/tests/test_ingestion_job.rs (1)
165-166: LGTM! Consistent endpoint handling in tests.The test code correctly uses
.as_ref()for endpoint parameters and properly populatesendpoint_urlin the configuration structures, aligning with the new endpoint handling throughout the codebase.Also applies to: 183-183, 201-202, 247-248, 259-259
I don't have any objections |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.lockdocs/src/_static/generated/log-ingestor-openapi.jsonis excluded by!**/generated/**
📒 Files selected for processing (15)
components/api-server/Cargo.toml(1 hunks)components/api-server/src/client.rs(2 hunks)components/clp-rust-utils/src/aws.rs(1 hunks)components/clp-rust-utils/src/clp_config/s3_config.rs(1 hunks)components/clp-rust-utils/src/job_config/ingestion.rs(1 hunks)components/clp-rust-utils/src/lib.rs(1 hunks)components/clp-rust-utils/src/s3/client.rs(3 hunks)components/clp-rust-utils/src/sqs/client.rs(3 hunks)components/clp-rust-utils/tests/clp_config_test.rs(3 hunks)components/log-ingestor/src/aws_client_manager.rs(3 hunks)components/log-ingestor/src/compression/compression_job_submitter.rs(1 hunks)components/log-ingestor/src/ingestion_job_manager.rs(6 hunks)components/log-ingestor/src/routes.rs(2 hunks)components/log-ingestor/tests/aws_config.rs(2 hunks)components/log-ingestor/tests/test_ingestion_job.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
📚 Learning: 2025-04-25T20:46:20.140Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
Applied to files:
components/api-server/src/client.rscomponents/clp-rust-utils/src/aws.rscomponents/clp-rust-utils/src/clp_config/s3_config.rscomponents/clp-rust-utils/tests/clp_config_test.rscomponents/clp-rust-utils/src/s3/client.rs
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
components/api-server/src/client.rscomponents/clp-rust-utils/src/clp_config/s3_config.rscomponents/clp-rust-utils/src/sqs/client.rscomponents/log-ingestor/src/aws_client_manager.rscomponents/log-ingestor/tests/aws_config.rscomponents/clp-rust-utils/src/job_config/ingestion.rscomponents/log-ingestor/tests/test_ingestion_job.rscomponents/clp-rust-utils/tests/clp_config_test.rscomponents/clp-rust-utils/src/s3/client.rscomponents/log-ingestor/src/ingestion_job_manager.rs
📚 Learning: 2025-09-17T22:51:15.765Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1304
File: components/webui/client/src/sql-parser/index.ts:98-115
Timestamp: 2025-09-17T22:51:15.765Z
Learning: In the CLP webui codebase, when designing APIs that accept optional string parameters, developers should omit the value (pass undefined) rather than pass empty strings. The API contract should be clear that empty strings are not valid inputs - only undefined should be used to indicate omission of optional clauses.
Applied to files:
components/clp-rust-utils/src/job_config/ingestion.rs
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/log-ingestor/tests/test_ingestion_job.rs
📚 Learning: 2025-01-17T23:25:38.165Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 673
File: components/log-viewer-webui/server/src/S3Manager.js:0-0
Timestamp: 2025-01-17T23:25:38.165Z
Learning: In S3Manager.js, URL validation is handled by the URL constructor which throws TypeError for invalid URLs, making additional scheme (s3://) validation redundant.
Applied to files:
components/clp-rust-utils/src/s3/client.rs
🧬 Code graph analysis (4)
components/log-ingestor/src/aws_client_manager.rs (2)
components/clp-rust-utils/src/s3/client.rs (1)
create_new_client(19-40)components/clp-rust-utils/src/sqs/client.rs (1)
create_new_client(15-34)
components/log-ingestor/tests/test_ingestion_job.rs (1)
components/clp-rust-utils/src/types.rs (1)
from_string(26-28)
components/clp-rust-utils/tests/clp_config_test.rs (1)
components/clp-rust-utils/src/types.rs (1)
from_static_str(14-16)
components/log-ingestor/src/ingestion_job_manager.rs (1)
components/log-ingestor/src/aws_client_manager.rs (2)
create(53-66)create(87-101)
🔇 Additional comments (19)
components/clp-rust-utils/src/aws.rs (1)
1-1: LGTM! Default region constant aligns with AWS conventions.The choice of
us-east-1as the default region is appropriate and consistent with established practices for handling S3 configurations without explicit region specifications.Based on learnings, this default ensures compatibility with legacy endpoints and S3 client creation paths that require a region string.
components/log-ingestor/src/compression/compression_job_submitter.rs (1)
69-69: LGTM! Endpoint URL properly propagated.The endpoint_url is correctly forwarded from the ingestion job configuration to the S3 input configuration, enabling custom S3 endpoint support for compression jobs.
components/log-ingestor/src/routes.rs (2)
92-94: LGTM! Error handling correctly maps new variant.The
CustomEndpointUrlNotSupportederror is appropriately mapped to HTTP 400 BAD_REQUEST, ensuring proper client feedback when custom endpoints are unsupported for specific job types.
200-205: LGTM! OpenAPI documentation updated.The BAD_REQUEST response documentation clearly communicates the limitation regarding custom endpoint URLs for SQS listener jobs.
components/clp-rust-utils/src/clp_config/s3_config.rs (1)
8-10: LGTM! Optional fields support custom S3 endpoints.The changes correctly make
region_codeoptional and addendpoint_url, enabling support for custom S3-compatible services (MinIO, LocalStack, etc.) that don't require AWS region codes.Based on learnings, this aligns with PR #1767's approach where region_code is optional because custom endpoints use path-style URLs.
components/api-server/src/client.rs (1)
335-338: LGTM! Default region handling is correct.The use of
map_or(AWS_DEFAULT_REGION, ...)properly ensures a valid region string is always provided to the S3 client, falling back tous-east-1whenregion_codeis absent.Based on learnings, this approach correctly addresses the requirement for a default region when none is specified.
components/log-ingestor/tests/aws_config.rs (2)
13-16: LGTM! Test configuration uses validated types.The update to
NonEmptyStringforendpointandregionfields ensures type-level validation and consistency with the production configuration structures.
67-71: LGTM! Validation ensures non-empty values.The use of
NonEmptyString::new(...).map_err(...)provides clear error messages when environment variables are set to empty strings, improving test configuration robustness.components/clp-rust-utils/src/s3/client.rs (1)
6-6: LGTM!The implementation correctly integrates the
NonEmptyStringwrapper for theendpointparameter. The use ofToString::to_stringfor mapping is appropriate, and the directRegionconstruction is clean.Also applies to: 19-40
components/clp-rust-utils/tests/clp_config_test.rs (1)
12-23: LGTM!The test correctly reflects the updated
S3Configstructure withregion_codeasOption<NonEmptyString>and the newendpoint_urlfield. Serialization expectations are properly updated for both MessagePack and JSON formats.Also applies to: 46-63
components/clp-rust-utils/src/job_config/ingestion.rs (1)
17-26: LGTM!The addition of optional
regionandendpoint_urlfields is well-implemented. The documentation clearly explains their purpose, and theserde(default)attributes ensure backward compatibility. This aligns with the learning that custom S3-compatible endpoints don't require AWS region codes.Based on learnings, region is optional for custom S3 endpoints (MinIO, LocalStack).
components/clp-rust-utils/src/sqs/client.rs (1)
6-6: LGTM!The implementation mirrors the S3 client changes and correctly integrates the
NonEmptyStringwrapper for theendpointparameter. The pattern is consistent across both client types.Also applies to: 15-34
components/log-ingestor/src/ingestion_job_manager.rs (3)
37-38: LGTM!The
CustomEndpointUrlNotSupportederror variant is appropriately added to handle cases where custom endpoint URLs are not yet supported (e.g., SQS listener jobs).
109-122: LGTM!The implementation correctly handles custom endpoint URLs:
- S3 scanner jobs properly forward the
endpoint_urlto the client wrapper- SQS listener jobs explicitly reject custom endpoint URLs with a clear error message, which is appropriate since SQS listener support for custom endpoints is not yet implemented
Also applies to: 135-159
225-275: LGTM!The conflict detection logic is correctly updated to include
endpoint_urlcomparison. This ensures that ingestion jobs targeting different S3-compatible endpoints (even with the same bucket/prefix) are properly distinguished and don't conflict. TheIngestionJobTableEntrystructure is appropriately updated to store both optionalregionandendpoint_url.Also applies to: 306-315
components/log-ingestor/src/aws_client_manager.rs (2)
53-66: LGTM!The
SqsClientWrapper::createmethod correctly accepts an optionalregionparameter and properly defaults toAWS_DEFAULT_REGIONwhenNone. The use ofmap_orwithNonEmptyString::as_stris clean and idiomatic.
87-101: LGTM!The
S3ClientWrapper::createmethod correctly handles both optionalregionandendpoint_urlparameters. The region defaulting logic is consistent with the SQS wrapper, and theendpoint_urlis properly forwarded to the underlying client factory.components/log-ingestor/tests/test_ingestion_job.rs (2)
161-186: LGTM!The test correctly reflects the updated API:
- Client creation calls properly pass
region.as_str()andSome(&aws_config.endpoint)BaseConfigis updated with optionalregionandendpoint_urlfields- Line 172 no longer uses
.unwrap()(addressing the past review concern), instead usingaws_config.endpointdirectlyAlso applies to: 197-203
243-264: LGTM!The
test_s3_scannertest is consistently updated following the same pattern astest_sqs_listener. The S3 client creation and configuration properly use the optionalregionandendpoint_urlfields.
LinZhihao-723
left a comment
There was a problem hiding this comment.
Hi @sudheergajula, sorry for the late reply.
I've updated this PR to make sure it can fully support custom endpoints. The major changes are as follow:
- As changed in #1767, the S3 region can be None now. I've updated the job config and the CLP config accordingly. However, AWS SDK requires the region code to always be set. I added a constant for the default region (us-east-1) instead. I choose to let the caller decide which default region to use since the desired default might not always be us-east-1 .
- #1767 also updated the compression job config to take an optional endpoint URL. I've updated the compression job submitter to use the given endpoint URL for compression when specified.
- We don't want ingestion jobs to monitor the same key prefix (we call prefix conflict) to avoid duplicate ingestion. As we support endpoint URL now, we need to make sure the detection scope is within the same endpoint, not just the bucket name + region + dataset. The current check is a bit verbose. I've created an issue (#1805) to keep track of for future refactoring.
- For SQS listener, I disable custom endpoint for now for simplicity (as the SQS endpoint might be different from the S3 endpoint). We will add support for custom endpoint in future PRs.
- I also updated the generated OpenAPI doc as we updated the ingestion job config and the API return code.
What I've tested:
- Start a minio service, and set credentials to be the same as my AWS credentials.
Create two ingestion jobs and generate logs to the destinated buckets on different endpoints in parallel using the helper script I created here:- The first job monitors minio (the given bucket + prefix) using S3 scanner
-The second job monitors AWS S3 using SQS listener (I configured a SQS queue to listen to the bucket)
- The first job monitors minio (the given bucket + prefix) using S3 scanner
- Ensure all objects can be captured by the log ingestor.
- Ensure all objects will be eventually submitted to CLP for compression by the log ingestor.
- Ensure all compression jobs complete without errors, and these logs are searchable through CLP webui.
@hoophalab Please help me review the changes, thank you!
hoophalab
left a comment
There was a problem hiding this comment.
LGTM. I posted a diff on my comments to save your time.
- Yes, we can use endpoint_url in API server. I did the change in diff
- Can we make region
Option<NonEmptyString>and set it asus-east-1in rust-utils rather than the actual user? I also tried passingNonedirectly to s3 client. Minio doesn't support resolving the region automatically.
diff --git a/components/api-server/src/client.rs b/components/api-server/src/client.rs
index c2fd6eec..79ad212c 100644
--- a/components/api-server/src/client.rs
+++ b/components/api-server/src/client.rs
@@ -2,7 +2,6 @@ use std::pin::Pin;
use async_stream::stream;
use clp_rust_utils::{
- aws::AWS_DEFAULT_REGION,
clp_config::{
AwsAuthentication,
package::{
@@ -332,11 +331,8 @@ impl Client {
let s3_client = clp_rust_utils::s3::create_new_client(
credentials.access_key_id.as_str(),
credentials.secret_access_key.as_str(),
- s3_config
- .region_code
- .as_ref()
- .map_or(AWS_DEFAULT_REGION, non_empty_string::NonEmptyString::as_str),
- None,
+ s3_config.region_code.as_ref(),
+ s3_config.endpoint_url.as_ref(),
)
.await;
diff --git a/components/clp-rust-utils/src/s3/client.rs b/components/clp-rust-utils/src/s3/client.rs
index 73dfd14e..1d2ce062 100644
--- a/components/clp-rust-utils/src/s3/client.rs
+++ b/components/clp-rust-utils/src/s3/client.rs
@@ -1,3 +1,4 @@
+use crate::aws::AWS_DEFAULT_REGION;
use aws_config::BehaviorVersion;
use aws_sdk_s3::{
Client,
@@ -19,7 +20,7 @@ use non_empty_string::NonEmptyString;
pub async fn create_new_client(
access_key_id: &str,
secret_access_key: &str,
- region_id: &str,
+ region_id: Option<&NonEmptyString>,
endpoint: Option<&NonEmptyString>,
) -> Client {
let credential = Credentials::new(
@@ -31,9 +32,13 @@ pub async fn create_new_client(
);
let base_config = aws_config::defaults(BehaviorVersion::latest()).load().await;
let mut config_builder = Builder::from(&base_config)
- .region(Region::new(region_id.to_string()))
.credentials_provider(credential)
.force_path_style(true);
+ config_builder.set_region(Some(Region::new(if let Some(id) = region_id {
+ id.to_string()
+ } else {
+ AWS_DEFAULT_REGION.to_owned()
+ })));
config_builder.set_endpoint_url(endpoint.map(std::string::ToString::to_string));
let config = config_builder.build();
Client::from_conf(config)
diff --git a/components/clp-rust-utils/src/sqs/client.rs b/components/clp-rust-utils/src/sqs/client.rs
index a6596390..5cb5d974 100644
--- a/components/clp-rust-utils/src/sqs/client.rs
+++ b/components/clp-rust-utils/src/sqs/client.rs
@@ -1,3 +1,4 @@
+use crate::aws::AWS_DEFAULT_REGION;
use aws_config::BehaviorVersion;
use aws_sdk_sqs::{
Client,
@@ -15,7 +16,7 @@ use non_empty_string::NonEmptyString;
pub async fn create_new_client(
access_key_id: &str,
secret_access_key: &str,
- region_id: &str,
+ region_id: Option<&NonEmptyString>,
endpoint: Option<&NonEmptyString>,
) -> Client {
let credential = Credentials::new(
@@ -26,9 +27,12 @@ pub async fn create_new_client(
"clp-credential-provider",
);
let base_config = aws_config::defaults(BehaviorVersion::latest()).load().await;
- let mut config_builder = Builder::from(&base_config)
- .credentials_provider(credential)
- .region(Region::new(region_id.to_string()));
+ let mut config_builder = Builder::from(&base_config).credentials_provider(credential);
+ config_builder.set_region(Some(Region::new(if let Some(id) = region_id {
+ id.to_string()
+ } else {
+ AWS_DEFAULT_REGION.to_owned()
+ })));
config_builder.set_endpoint_url(endpoint.map(std::string::ToString::to_string));
Client::from_conf(config_builder.build())
}
diff --git a/components/log-ingestor/src/aws_client_manager.rs b/components/log-ingestor/src/aws_client_manager.rs
index 8e59606f..ab2cdd47 100644
--- a/components/log-ingestor/src/aws_client_manager.rs
+++ b/components/log-ingestor/src/aws_client_manager.rs
@@ -2,7 +2,6 @@ use anyhow::Result;
use async_trait::async_trait;
use aws_sdk_s3::Client as S3Client;
use aws_sdk_sqs::Client as SqsClient;
-use clp_rust_utils::aws::AWS_DEFAULT_REGION;
use non_empty_string::NonEmptyString;
/// A marker trait for AWS client types.
@@ -55,13 +54,9 @@ impl SqsClientWrapper {
access_key_id: &str,
secret_access_key: &str,
) -> Self {
- let sqs_client = clp_rust_utils::sqs::create_new_client(
- access_key_id,
- secret_access_key,
- region.map_or(AWS_DEFAULT_REGION, NonEmptyString::as_str),
- None,
- )
- .await;
+ let sqs_client =
+ clp_rust_utils::sqs::create_new_client(access_key_id, secret_access_key, region, None)
+ .await;
Self::from(sqs_client)
}
}
@@ -93,7 +88,7 @@ impl S3ClientWrapper {
let s3_client = clp_rust_utils::s3::create_new_client(
access_key_id,
secret_access_key,
- region.map_or(AWS_DEFAULT_REGION, NonEmptyString::as_str),
+ region,
endpoint_url,
)
.await;
diff --git a/components/log-ingestor/tests/test_ingestion_job.rs b/components/log-ingestor/tests/test_ingestion_job.rs
index 44f8c6c5..c7d93e10 100644
--- a/components/log-ingestor/tests/test_ingestion_job.rs
+++ b/components/log-ingestor/tests/test_ingestion_job.rs
@@ -161,7 +161,7 @@ async fn test_sqs_listener() -> Result<()> {
let sqs_client = clp_rust_utils::sqs::create_new_client(
aws_config.access_key_id.as_str(),
aws_config.secret_access_key.as_str(),
- aws_config.region.as_str(),
+ Some(&aws_config.region),
Some(&aws_config.endpoint),
)
.await;
@@ -197,7 +197,7 @@ async fn test_sqs_listener() -> Result<()> {
let s3_client = clp_rust_utils::s3::create_new_client(
aws_config.access_key_id.as_str(),
aws_config.secret_access_key.as_str(),
- aws_config.region.as_str(),
+ Some(&aws_config.region),
Some(&aws_config.endpoint),
)
.await;
@@ -243,7 +243,7 @@ async fn test_s3_scanner() -> Result<()> {
let s3_client = clp_rust_utils::s3::create_new_client(
aws_config.access_key_id.as_str(),
aws_config.secret_access_key.as_str(),
- aws_config.region.as_str(),
+ Some(&aws_config.region),
Some(&aws_config.endpoint),
)
.await;|
Validation:
|
log-ingestor; Add support for streaming search results from custom S3 endpoints using the API server.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
components/api-server/src/client.rs(1 hunks)components/clp-rust-utils/src/s3/client.rs(3 hunks)components/clp-rust-utils/src/sqs/client.rs(3 hunks)components/log-ingestor/src/aws_client_manager.rs(3 hunks)components/log-ingestor/tests/test_ingestion_job.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
components/api-server/src/client.rscomponents/clp-rust-utils/src/s3/client.rscomponents/log-ingestor/src/aws_client_manager.rscomponents/clp-rust-utils/src/sqs/client.rscomponents/log-ingestor/tests/test_ingestion_job.rs
📚 Learning: 2025-04-25T20:46:20.140Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
Applied to files:
components/api-server/src/client.rscomponents/clp-rust-utils/src/s3/client.rs
📚 Learning: 2025-01-17T23:25:38.165Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 673
File: components/log-viewer-webui/server/src/S3Manager.js:0-0
Timestamp: 2025-01-17T23:25:38.165Z
Learning: In S3Manager.js, URL validation is handled by the URL constructor which throws TypeError for invalid URLs, making additional scheme (s3://) validation redundant.
Applied to files:
components/clp-rust-utils/src/s3/client.rs
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/log-ingestor/tests/test_ingestion_job.rs
🧬 Code graph analysis (2)
components/log-ingestor/src/aws_client_manager.rs (2)
components/clp-rust-utils/src/s3/client.rs (1)
create_new_client(21-45)components/clp-rust-utils/src/sqs/client.rs (1)
create_new_client(17-38)
components/log-ingestor/tests/test_ingestion_job.rs (1)
components/clp-rust-utils/src/types.rs (1)
from_string(26-28)
🔇 Additional comments (8)
components/api-server/src/client.rs (1)
334-335: LGTM! Endpoint URL now properly propagated.The changes correctly pass both the region and endpoint URL as references to the S3 client creation function. This addresses the previous issue where the endpoint parameter was hardcoded to
None, which prevented custom S3-compatible endpoints from working for query result storage.components/log-ingestor/tests/test_ingestion_job.rs (3)
164-187: LGTM! Consistent parameter handling throughout the test.The test correctly uses borrowed references (
Some(&aws_config.region),Some(&aws_config.endpoint)) for client creation and clones values for theBaseConfigstruct. The direct usage ofaws_config.endpointin theformat!macro at line 172 is appropriate sinceformat!handles ownership internally.
197-203: LGTM! Consistent client creation pattern.The S3 client creation follows the same correct pattern as the SQS client, passing borrowed references for both region and endpoint parameters.
243-264: LGTM! Test configuration properly updated.The
test_s3_scannerfunction correctly mirrors the parameter handling pattern fromtest_sqs_listener, ensuring consistency across the test suite.components/log-ingestor/src/aws_client_manager.rs (1)
82-96: LGTM! S3 client wrapper properly supports custom endpoints.The
S3ClientWrapper::createmethod correctly accepts and propagates both the optional region and endpoint URL parameters to the underlying S3 client creation function, enabling full support for custom S3-compatible endpoints like MinIO and LocalStack.components/clp-rust-utils/src/s3/client.rs (2)
6-25: LGTM! Function signature properly updated for optional parameters.The imports and function signature correctly support optional region and endpoint parameters using
NonEmptyStringreferences. This aligns with the requirement that custom S3-compatible endpoints (MinIO, LocalStack) don't require AWS region codes.Based on learnings, the region field is optional because custom endpoints use path-style URLs and don't require AWS region codes.
38-42: LGTM! Region fallback and endpoint handling implemented correctly.The region configuration properly falls back to
AWS_DEFAULT_REGIONwhen no region is provided, which is appropriate for custom S3-compatible endpoints. The endpoint URL is correctly mapped to aStringwhen present and passed to the config builder.components/clp-rust-utils/src/sqs/client.rs (1)
6-36: LGTM! SQS client implementation consistent with S3 client.The SQS client creation function mirrors the S3 client implementation, correctly handling optional region and endpoint parameters with the same fallback logic. The configuration builder pattern properly sets credentials, region (with
AWS_DEFAULT_REGIONfallback), and endpoint URL.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
docs/src/_static/generated/log-ingestor-openapi.jsonis excluded by!**/generated/**
📒 Files selected for processing (6)
components/api-server/src/client.rs(7 hunks)components/clp-rust-utils/src/s3/client.rs(3 hunks)components/log-ingestor/src/aws_client_manager.rs(3 hunks)components/log-ingestor/src/ingestion_job_manager.rs(6 hunks)components/log-ingestor/src/routes.rs(3 hunks)components/log-ingestor/tests/test_ingestion_job.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/core/src/clp/aws/constants.hpp:10-16
Timestamp: 2025-12-12T16:22:15.181Z
Learning: AWS S3 region codes follow the pattern `[region]-[zone]-[digit]` where the digit is always a single decimal digit (e.g., us-east-1, eu-west-3, ap-northeast-2). Custom S3-compatible endpoints (MinIO, LocalStack) may use different region naming conventions.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
components/log-ingestor/tests/test_ingestion_job.rscomponents/log-ingestor/src/aws_client_manager.rscomponents/clp-rust-utils/src/s3/client.rscomponents/log-ingestor/src/routes.rscomponents/log-ingestor/src/ingestion_job_manager.rscomponents/api-server/src/client.rs
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
Repo: y-scope/clp PR: 557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/log-ingestor/tests/test_ingestion_job.rs
📚 Learning: 2025-04-25T20:46:20.140Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
Applied to files:
components/log-ingestor/src/aws_client_manager.rscomponents/clp-rust-utils/src/s3/client.rscomponents/log-ingestor/src/routes.rscomponents/log-ingestor/src/ingestion_job_manager.rscomponents/api-server/src/client.rs
📚 Learning: 2025-01-17T23:25:38.165Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 673
File: components/log-viewer-webui/server/src/S3Manager.js:0-0
Timestamp: 2025-01-17T23:25:38.165Z
Learning: In S3Manager.js, URL validation is handled by the URL constructor which throws TypeError for invalid URLs, making additional scheme (s3://) validation redundant.
Applied to files:
components/clp-rust-utils/src/s3/client.rs
🧬 Code graph analysis (3)
components/log-ingestor/tests/test_ingestion_job.rs (1)
components/clp-rust-utils/src/types.rs (1)
from_string(26-28)
components/log-ingestor/src/aws_client_manager.rs (2)
components/clp-rust-utils/src/s3/client.rs (1)
create_new_client(19-40)components/clp-rust-utils/src/sqs/client.rs (1)
create_new_client(17-38)
components/api-server/src/client.rs (2)
components/clp-rust-utils/src/s3/client.rs (1)
create_new_client(19-40)components/clp-rust-utils/src/sqs/client.rs (1)
create_new_client(17-38)
🔇 Additional comments (14)
components/api-server/src/client.rs (7)
5-5: LGTM!The import of
AWS_DEFAULT_REGIONis correctly placed and necessary for the default region fallback logic when connecting to custom S3 endpoints.
165-165: LGTM!The documentation correctly reflects that errors from
fetch_results_from_s3are now forwarded due to the Result return type.
202-202: LGTM!Error propagation is now correctly implemented with
.await?, enabling proper error handling for custom S3 endpoints.
329-329: LGTM!The return type change to
Result<impl Stream, ClientError>correctly enables early error return for validation failures.
338-343: LGTM!The validation logic correctly enforces that a region code must be provided when using the default AWS S3 endpoint, while allowing custom S3-compatible endpoints (MinIO, LocalStack) to omit the region. Based on learnings, this aligns with the requirement that only AWS S3 endpoints require region codes.
345-355: LGTM!The S3 client creation correctly handles both AWS and custom S3 endpoints:
- Region defaults to
AWS_DEFAULT_REGIONwhen not provided (for custom S3 endpoints)endpoint_urlis properly propagated to enable custom S3-compatible services- Implementation is consistent with the pattern used in
sqs/client.rsBased on learnings, this correctly addresses the previous review concern about missing endpoint propagation.
366-393: LGTM!The stream is correctly wrapped in
Ok()to match theResultreturn type, enabling proper error handling for the validation logic.components/log-ingestor/src/routes.rs (1)
92-95: LGTM! Error handling properly aligned with new endpoint/region validation.The error mappings for
CustomEndpointUrlNotSupportedandMissingRegionCodecorrectly returnBAD_REQUEST, and the OpenAPI documentation accurately describes when these errors occur. This aligns well with the validation logic iningestion_job_manager.rs.Also applies to: 156-160, 207-211
components/log-ingestor/src/ingestion_job_manager.rs (3)
114-122: LGTM! Proper validation for S3 scanner job creation.The logic correctly enforces that either
endpoint_urlorregionmust be provided for S3 scanner jobs. When using custom S3-compatible endpoints (like MinIO), the region may not be required, while AWS S3 endpoints require a region. This validation aligns with the learnings from PR #1767.Based on learnings, custom S3-compatible endpoints use path-style URLs and don't require AWS region codes.
144-149: LGTM! Clear rejection of unsupported SQS custom endpoints.The explicit error for custom endpoint URLs on SQS listener jobs is appropriate since this feature is not yet implemented. The error message is descriptive and includes the problematic endpoint URL, making it easy for users to understand the limitation.
233-251: LGTM! Conflict detection properly includes endpoint_url.The conflict detection now correctly considers
endpoint_urlin addition toregion,bucket_name,dataset, andkey_prefix. This ensures that jobs accessing the same bucket through different endpoints (e.g., AWS S3 vs. MinIO) are not incorrectly flagged as conflicting.components/log-ingestor/tests/test_ingestion_job.rs (1)
161-186: LGTM! Tests properly updated for optional region and endpoint_url.The tests correctly:
- Wrap
regionandendpointinSome()when creating clients.- Construct
BaseConfigwithSome(aws_config.region.clone())andSome(aws_config.endpoint.clone()).- Pass borrowed references (
&aws_config.region,&aws_config.endpoint) to client creation functions.All test updates align with the new type signatures and properly exercise the custom endpoint functionality.
Also applies to: 197-203, 243-264
components/log-ingestor/src/aws_client_manager.rs (2)
53-62: LGTM! SQS client correctly omits endpoint parameter.The
SqsClientWrapper::createfunction does not expose anendpointparameter and passesNoneto the underlyingcreate_new_client. This is consistent with the explicit rejection of custom endpoint URLs for SQS listener jobs iningestion_job_manager.rs(lines 144-149), where the error states "SQS listener ingestion jobs do not support custom endpoint URLs yet."
83-97: LGTM! S3 client properly handles optional region and custom endpoints.The
S3ClientWrapper::createfunction correctly:
- Accepts
region: Option<&NonEmptyString>and defaults toAWS_DEFAULT_REGIONwhenNone.- Passes
endpoint_urlthrough to enable custom S3-compatible endpoints (MinIO, LocalStack, etc.).The default region handling ensures that even when users provide a custom endpoint without a region, the S3 client still has a valid region value (which custom endpoints typically ignore).
Based on learnings, custom S3-compatible endpoints don't require specific AWS region codes.
LinZhihao-723
left a comment
There was a problem hiding this comment.
The further exploration shows that:
- The region code must be set in Rust S3 client
- When using the default AWS S3 endpoint, the correct region code must be given explicitly.
The latest commits fix this problem by checking whether both the endpoint and the region are None. If so, it will invalidate the given config. This validation is on the caller's end, as the caller may also decide which default region code to use (for example, when using minio, the region code must NOT be provided and the caller needs to feed a fake region code for it).
hoophalab
left a comment
There was a problem hiding this comment.
LGTM. One nitpick
Validations:
- Successfully ingested logs from minio adn aws s3.
- If request isn't correct, an error message returned.
- API server can fetch query results from minio.
Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
LinZhihao-723
left a comment
There was a problem hiding this comment.
@sudheergajula Thank you for your contribution! We will merge this PR and it will be a part of the coming release.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/api-server/src/client.rs(7 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/core/src/clp/aws/constants.hpp:10-16
Timestamp: 2025-12-12T16:22:15.181Z
Learning: AWS S3 region codes follow the pattern `[region]-[zone]-[digit]` where the digit is always a single decimal digit (e.g., us-east-1, eu-west-3, ap-northeast-2). Custom S3-compatible endpoints (MinIO, LocalStack) may use different region naming conventions.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 743
File: components/log-viewer-webui/server/src/plugins/S3Manager.ts:76-76
Timestamp: 2025-04-07T09:30:20.684Z
Learning: In the S3Manager plugin, an `isEnabled` option should be added and checked instead of checking if the region is null. This would make the plugin's behavior more explicit.
Learnt from: junhaoliao
Repo: y-scope/clp PR: 743
File: components/log-viewer-webui/server/src/plugins/S3Manager.ts:76-76
Timestamp: 2025-04-07T09:30:20.684Z
Learning: In the S3Manager plugin, an `isEnabled` option should be added and checked instead of checking if the region is null. This would make the plugin's behavior more explicit.
Learnt from: haiqi96
Repo: y-scope/clp PR: 673
File: components/log-viewer-webui/server/src/routes/query.js:9-10
Timestamp: 2025-01-17T22:33:23.547Z
Learning: When initializing S3Manager in log-viewer-webui, both StreamFilesS3Region and StreamFilesS3PathPrefix settings must be non-null for S3 functionality to work correctly.
📚 Learning: 2025-12-10T23:32:24.363Z
Learnt from: hoophalab
Repo: y-scope/clp PR: 1767
File: components/clp-py-utils/clp_py_utils/clp_config.py:580-585
Timestamp: 2025-12-10T23:32:24.363Z
Learning: In PR #1767, custom S3 endpoint support was added to CLP. The S3Config.region_code field is now optional (NonEmptyStr | None) because custom S3-compatible endpoints (MinIO, LocalStack, etc.) use path-style URLs and don't require AWS region codes. Only AWS S3 endpoints require region_code. Presto integration still requires region_code because it only works with AWS S3.
Applied to files:
components/api-server/src/client.rs
📚 Learning: 2025-04-25T20:46:20.140Z
Learnt from: haiqi96
Repo: y-scope/clp PR: 852
File: components/clp-package-utils/clp_package_utils/scripts/native/compress.py:151-160
Timestamp: 2025-04-25T20:46:20.140Z
Learning: For S3 URLs without region specifications (legacy global endpoints), either assign a default region (us-east-1) or throw a clear error message requiring region specification in the URL. This addresses validation issues in components like S3InputConfig that require a non-nullable region string.
Applied to files:
components/api-server/src/client.rs
🔇 Additional comments (6)
components/api-server/src/client.rs (6)
5-5: LGTM!The import of
AWS_DEFAULT_REGIONis appropriate and necessary for the default region fallback logic used whenregion_codeis not provided.
165-165: LGTM!The documentation update and error propagation are correct. The function now properly awaits
fetch_results_from_s3and uses the?operator to propagate function-level errors, which aligns with the updated return type offetch_results_from_s3.Also applies to: 202-202
329-329: LGTM!The return type change to
Result<impl Stream<...>, ClientError>is appropriate. It enables the function to return validation errors before stream creation, which is necessary for the AWS region validation logic.
338-343: LGTM!The validation logic correctly enforces that a region code must be provided when using the default AWS S3 endpoint (when
endpoint_urlisNone). Custom S3-compatible endpoints (MinIO, LocalStack, etc.) can proceed without a region code, as they don't require AWS region codes.Based on learnings, this aligns with the S3 endpoint requirements established in PR #1767.
345-355: LGTM! Past issue successfully addressed.The S3 client is now correctly constructed with:
- Region: Uses the configured
region_codeif provided, otherwise falls back toAWS_DEFAULT_REGION. This ensures custom S3-compatible endpoints work even when region is not required.- Endpoint: Properly propagates
s3_config.endpoint_urlviaas_ref(), enabling custom S3 endpoints (MinIO, LocalStack, etc.) to function correctly.This change addresses the critical issue flagged in past review comments where the endpoint parameter was passed as
None, preventing custom S3 endpoints from working for query result storage.
366-366: LGTM!The stream is correctly wrapped in
Ok(...)to match the function'sResult<impl Stream<...>, ClientError>return type. This allows the function to return validation errors before stream creation while still yielding a stream on success.Also applies to: 393-393
… using `log-ingestor`; Add support for streaming search results from custom S3 endpoints using the API server. (y-scope#1776) Co-authored-by: Sudheer Gajula <sudheer.gajula@indexexchange.com> Co-authored-by: LinZhihao-723 <zh.lin@mail.utoronto.ca> Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com> Co-authored-by: hoophalab <200652805+hoophalab@users.noreply.github.com>
Summary:
This PR includes fixes for following:
Test Plan - Tested log ingestion with custom s3 endpoints.
Payload :
curl -X POST http://host:3002/s3_scanner \ -H "Content-Type: application/json" \ -d '{ "region": "us-east-1", "bucket_name": "logs-pub", "key_prefix": "logs/", "dataset": "default", "timestamp_key": "ts", "unstructured": false, "scanning_interval_sec": 1800, "start_after": "2025-01-01T00:00:00Z", "endpoint_url": "http://minio.com:9000" }'Description
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Bug Fixes / Behavior
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.