feat(api): security+metrics enhancements #50

zensgit · 2025-09-26T05:31:25Z

PR Security & Metrics Summary (Template)

Overview

This PR strengthens API security and observability. Copy & adapt sections below for the final PR description.

Key Changes

Login rate limiting (IP + email key) with structured 429 JSON and Retry-After header.
Metrics endpoint CIDR allow + deny lists (ALLOW_PUBLIC_METRICS=0, METRICS_ALLOW_CIDRS, METRICS_DENY_CIDRS).
Password rehash failure breakdown: jive_password_rehash_fail_breakdown_total{cause="hash"|"update"}.
Export performance histograms (buffered & streaming) and uptime metric.
New security / monitoring docs: Grafana dashboard, alert rules, security checklist.
Email-based rate limit key hashing (first 8 hex of SHA256) for privacy.

New / Modified Environment Variables

Variable	Purpose	Default
`AUTH_RATE_LIMIT`	Login attempts per window (N/SECONDS)	`30/60`
`AUTH_RATE_LIMIT_HASH_EMAIL`	Hash email in key (privacy)	`1`
`ALLOW_PUBLIC_METRICS`	If `0`, restrict metrics by CIDR	`1`
`METRICS_ALLOW_CIDRS`	Comma CIDR whitelist	`127.0.0.1/32`
`METRICS_DENY_CIDRS`	Comma CIDR deny (priority)	(empty)
`METRICS_CACHE_TTL`	Metrics base cache seconds	`30`

Prometheus Metrics Added

Metric	Type	Notes
`auth_login_rate_limited_total`	counter	Rate-limited login attempts
`jive_password_rehash_fail_breakdown_total{cause}`	counter	Split hash/update failures
`export_duration_buffered_seconds_*`	histogram	Export latency (buffered)
`export_duration_stream_seconds_*`	histogram	Export latency (stream)
`process_uptime_seconds`	gauge	Runtime age

Deprecated (pending removal): jive_password_rehash_fail_total (aggregate).

Quick Local Verification

Run stack (example):

ALLOW_PUBLIC_METRICS=1 AUTH_RATE_LIMIT=3/60 cargo run --bin jive-api &
sleep 2
./scripts/verify_observability.sh

Expect PASS output and non-zero counters for auth_login_fail_total after simulated attempts.

Reviewer Checklist

429 login response includes Retry-After and JSON structure
/metrics reachable only when expected (toggle ALLOW_PUBLIC_METRICS)
Rehash breakdown metrics appear
Export histogram buckets present
Uptime metric increasing across scrapes
Security checklist file present (docs/SECURITY_CHECKLIST.md)

Follow-up (Optional / Tracked)

Audit logging for repeated rate-limit triggers
Global unified error response model
Redis/distributed rate limiting for multi-instance scaling
Remove deprecated rehash aggregate metric (target v1.3.0)

…script

…il breakdown & observability assets - Login rate limiting (IP + email hash key) with structured 429 JSON + Retry-After - /metrics CIDR allow & deny (ALLOW_PUBLIC_METRICS, METRICS_ALLOW_CIDRS, METRICS_DENY_CIDRS) - Password rehash failure breakdown metric (hash vs update) + deprecate aggregate - Export latency histograms (buffered & stream) + process_uptime_seconds - New counters: auth_login_rate_limited_total, rehash_fail_breakdown - Security checklist, Grafana dashboard, Prometheus alert rules, PR summary template - Observability verification script (scripts/verify_observability.sh) - Opportunistic limiter cleanup; IPv6 CIDR precise matching; deny precedence DEPRECATION: jive_password_rehash_fail_total slated removal v1.3.0 (use breakdown).

gemini-code-assist · 2025-09-26T05:31:56Z

Summary of Changes

Hello @zensgit, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the API's security and observability. It introduces robust login rate limiting to prevent brute-force attacks, implements granular access control for the metrics endpoint, and expands the suite of Prometheus metrics to provide deeper insights into authentication outcomes, password rehash processes, and data export performance. The changes are complemented by new documentation and utility scripts to streamline monitoring setup and security best practices.

Highlights

Login Rate Limiting: Implemented login rate limiting based on IP and hashed email, returning a structured 429 JSON response with a 'Retry-After' header for enhanced security.
Metrics Endpoint Access Control: Introduced CIDR-based allow and deny lists for the '/metrics' endpoint, configurable via environment variables, to restrict access to monitoring infrastructure.
Enhanced Password Rehash Metrics: Added a new metric, 'jive_password_rehash_fail_breakdown_total', to provide a detailed breakdown of password rehash failures by cause (e.g., hash generation or database update issues).
Export Performance Metrics: Integrated Prometheus histograms ('export_duration_buffered_seconds_' and 'export_duration_stream_seconds_') to track the latency of both buffered and streaming data export operations, along with counters for requests and rows exported.
New Observability Documentation: Added comprehensive documentation including a Grafana dashboard template, example Prometheus alert rules, a security checklist, and a metrics deprecation plan to guide monitoring and operational practices.
Build Information Metric: Introduced a 'jive_build_info' gauge metric that exposes build-time details such as Git commit, build timestamp, Rust compiler version, and package version, aiding in debugging and version tracking.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull Request Overview

This PR enhances the Jive Money API with comprehensive security and observability features. It introduces login rate limiting with email-based keying, restricts metrics endpoint access via CIDR lists, expands metrics coverage with breakdown counters and performance histograms, and adds extensive documentation for monitoring and security best practices.

Key Changes

Login rate limiting with IP+email key combination and structured 429 responses
Metrics endpoint access control using CIDR allow/deny lists
Enhanced metrics including rehash failure breakdown, export performance histograms, and authentication counters

Reviewed Changes

Copilot reviewed 27 out of 28 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`scripts/verify_observability.sh`	Test script for validating core metrics presence
`scripts/check_metrics_consistency.sh`	Verification script for health vs metrics consistency
`jive-api/tests/integration/*.rs`	Integration tests for rate limiting, metrics, and export functionality
`jive-api/src/middleware/rate_limit.rs`	Complete rewrite implementing email-based rate limiting
`jive-api/src/middleware/metrics_guard.rs`	New CIDR-based access control for metrics endpoint
`jive-api/src/metrics.rs`	Expanded metrics with caching, histograms, and build info
`jive-api/src/main.rs`	Integration of rate limiting and metrics guard middleware
`jive-api/src/lib.rs`	Extended AppMetrics with new counters and histogram fields
`jive-api/src/handlers/transactions.rs`	Added export performance metrics tracking
`jive-api/src/handlers/auth.rs`	Added login failure and rate limiting metrics
`jive-api/build.rs`	Build script for capturing git commit and build metadata
`docs/*.md`	Comprehensive security, monitoring, and deprecation documentation
`README.md`	Updated with metrics documentation and environment variables
`Makefile`	Added metrics verification and test user seeding targets

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-26T05:32:55Z

jive-api/src/middleware/rate_limit.rs

-            records: Arc::new(RwLock::new(HashMap::new())),
-        }
+    pub fn new(max: u32, window_secs: u64) -> Self {
+        let hash_email = std::env::var("AUTH_RATE_LIMIT_HASH_EMAIL").map(|v| v=="1" || v.eq_ignore_ascii_case("true")).unwrap_or(true);


[nitpick] The environment variable parsing logic is duplicated and could be extracted into a helper function for consistency across the codebase. Consider creating a utility function for parsing boolean environment variables.

Copilot · 2025-09-26T05:32:55Z

jive-api/src/middleware/rate_limit.rs

+        // Opportunistic cleanup if map large
+        if map.len() > 10_000 {
+            let window = self.window;
+            map.retain(|_, (_c, start)| now.duration_since(*start) <= window);
        }


The magic number 10,000 should be extracted as a named constant to improve maintainability and make it configurable if needed.

Copilot · 2025-09-26T05:32:55Z

jive-api/src/middleware/rate_limit.rs

+    let norm = raw.trim().to_lowercase();
+    if norm.is_empty() { return None; }
+    if !hash { return Some(norm); }
+    let mut h = Sha256::new(); h.update(&norm); let hex = format!("{:x}", h.finalize());


[nitpick] This complex line should be split into multiple statements for better readability and debugging.

Suggested change

let mut h = Sha256::new(); h.update(&norm); let hex = format!("{:x}", h.finalize());

let mut h = Sha256::new();

h.update(&norm);

let hex = format!("{:x}", h.finalize());

Copilot · 2025-09-26T05:32:55Z

jive-api/src/services/currency_service.rs

                family_id,
                // Handle potentially nullable base_currency field
-                base_currency: settings.base_currency.unwrap_or_else(|| "CNY".to_string()),
+                base_currency: if settings.base_currency.is_empty() { "CNY".to_string() } else { settings.base_currency },


The logic change from unwrap_or_else to is_empty() check alters the behavior. This will panic if base_currency is None, whereas the original code provided a default. The logic should handle the Option type properly.

Suggested change

base_currency: if settings.base_currency.is_empty() { "CNY".to_string() } else { settings.base_currency },

base_currency: settings

.base_currency

.filter(|s| !s.is_empty())

.unwrap_or_else(|| "CNY".to_string()),

Copilot · 2025-09-26T05:32:56Z

jive-api/tests/integration/login_rate_limit_test.rs

+    #[tokio::test]
+    async fn login_rate_limit_blocks_after_threshold() {
+        let pool = create_test_pool().await;
+        // Seed a user so we can attempt logins (with wrong password to avoid side effects)
+        let email = format!("rl_{}@example.com", Uuid::new_v4());
+        sqlx::query("INSERT INTO users (email,password_hash,name,is_active,created_at,updated_at) VALUES ($1,'$argon2id$v=19$m=4096,t=3,p=1$dGVzdHNhbHQAAAAAAAAAAA$Jr7Z5fakehashHashHashHashHashHash','RL User',true,NOW(),NOW())")


[nitpick] The hardcoded hash string is very long and makes the code hard to read. Consider extracting it to a constant or using a test fixture helper function.

Suggested change

#[tokio::test]

async fn login_rate_limit_blocks_after_threshold() {

let pool = create_test_pool().await;

// Seed a user so we can attempt logins (with wrong password to avoid side effects)

let email = format!("rl_{}@example.com", Uuid::new_v4());

sqlx::query("INSERT INTO users (email,password_hash,name,is_active,created_at,updated_at) VALUES ($1,'$argon2id$v=19$m=4096,t=3,p=1$dGVzdHNhbHQAAAAAAAAAAA$Jr7Z5fakehashHashHashHashHashHash','RL User',true,NOW(),NOW())")

const TEST_PASSWORD_HASH: &str = "$argon2id$v=19$m=4096,t=3,p=1$dGVzdHNhbHQAAAAAAAAAAA$Jr7Z5fakehashHashHashHashHashHash";

#[tokio::test]

async fn login_rate_limit_blocks_after_threshold() {

let pool = create_test_pool().await;

// Seed a user so we can attempt logins (with wrong password to avoid side effects)

let email = format!("rl_{}@example.com", Uuid::new_v4());

sqlx::query(&format!(

"INSERT INTO users (email,password_hash,name,is_active,created_at,updated_at) VALUES ($1,'{}','RL User',true,NOW(),NOW())",

TEST_PASSWORD_HASH

))

gemini-code-assist

Code Review

This pull request significantly enhances the API's security and observability by introducing login rate limiting, a metrics endpoint with CIDR-based access control, and a rich set of new Prometheus metrics. The changes are well-structured, including comprehensive documentation, new tests, and utility scripts. Overall, this is a high-quality contribution. I've identified a few issues, primarily related to metric correctness and documentation consistency, and have provided suggestions for improvement.

gemini-code-assist · 2025-09-26T05:34:03Z

jive-api/src/handlers/auth.rs

-            println!("DEBUG[login]: failed to parse Argon2 hash: {:?}", e);
-            ApiError::InternalServerError
-        })?;
+        let parsed_hash = PasswordHash::new(hash).map_err(|e| { #[cfg(debug_assertions)] println!("DEBUG[login]: failed to parse Argon2 hash: {:?}", e); state.metrics.increment_login_fail(); ApiError::InternalServerError })?;


Incrementing auth_login_fail_total when a password hash fails to parse is incorrect. This metric should track user-attributable login failures (like wrong password), not internal server errors. A hash parsing failure indicates a data corruption issue or a bug, and mixing it with failed login attempts will pollute the metric and make it harder to distinguish between security events and system errors. Please remove the metric increment from this error handling path.

Suggested change

let parsed_hash = PasswordHash::new(hash).map_err(|e| { #[cfg(debug_assertions)] println!("DEBUG[login]: failed to parse Argon2 hash: {:?}", e); state.metrics.increment_login_fail(); ApiError::InternalServerError })?;

let parsed_hash = PasswordHash::new(hash).map_err(|e| { #[cfg(debug_assertions)] println!("DEBUG[login]: failed to parse Argon2 hash: {:?}", e); ApiError::InternalServerError })?;

gemini-code-assist · 2025-09-26T05:34:03Z

jive-api/src/handlers/auth.rs

        let argon2 = Argon2::default();
        argon2
            .verify_password(req.password.as_bytes(), &parsed_hash)
            .map_err(|_| ApiError::Unauthorized)?;


The auth_login_fail_total metric is not being incremented here when Argon2 password verification fails. This is inconsistent with other password verification failure paths in this function. To ensure all failed login attempts are correctly tracked, please add the metric increment.

Suggested change

.map_err(|_| ApiError::Unauthorized)?;

.map_err(|_| { state.metrics.increment_login_fail(); ApiError::Unauthorized })?;

gemini-code-assist · 2025-09-26T05:34:03Z

README.md

+password_hash_argon2id_total            # argon2id 数量
+password_hash_unknown_total             # 未识别前缀
+password_hash_total_count               # 总数
+password_hash_bcrypt_variant{variant="2b"} X  # 每个变体


The 'X' in this line appears to be a placeholder for the metric's value. It should be removed or replaced with a more descriptive placeholder like <value> to avoid confusion for users reading the documentation.

gemini-code-assist · 2025-09-26T05:34:03Z

README.md

+export_duration_buffered_seconds_*     # 缓冲导出耗时直方图 (bucket/sum/count)
+export_duration_stream_seconds_*       # 流式导出耗时直方图 (bucket/sum/count)
+process_uptime_seconds                 # 进程运行时长（秒）
+jive_build_info{commit,time,rustc,version} 1  # 构建信息 gauge


This line is a duplicate of line 214. To improve clarity and reduce redundancy in the documentation, please remove this repeated entry for the jive_build_info metric.

gemini-code-assist · 2025-09-26T05:34:03Z

docs/ALERT_RULES_EXAMPLE.yaml

+        annotations:
+          summary: Many logins being rate-limited (possible attack)
+      - alert: ProcessRestarted
+        expr: increase(process_uptime_seconds[5m]) < 60


Using increase() on a gauge like process_uptime_seconds is not idiomatic in PromQL and can be confusing. A simpler and more direct way to detect a recent restart is to check if the uptime is less than your scrape interval window.

For example, to detect a restart within the last 5 minutes (300 seconds), you could use:

process_uptime_seconds < 300

This is more readable and directly expresses the condition you want to alert on.

expr: process_uptime_seconds < 300

gemini-code-assist · 2025-09-26T05:34:04Z

docs/METRICS_DEPRECATION_PLAN.md

+| `auth_login_fail_total` | Count failed login attempts (unauthorized) | Planned |
+| `export_duration_seconds` (histogram) | Latency of export operations | Planned |
+| `process_uptime_seconds` | Seconds since process start | Implemented |


The 'Future Candidates' table seems to be out of sync with the changes in this PR. Several metrics listed as 'Planned' or 'Implemented' are now fully available.

auth_login_fail_total is implemented, not 'Planned'.

export_duration_seconds is implemented as export_duration_buffered_seconds and export_duration_stream_seconds.

To ensure the documentation accurately reflects the current state of the project, please update this table to mark these metrics as 'Implemented' and adjust their descriptions accordingly.

gemini-code-assist · 2025-09-26T05:34:04Z

jive-api/src/handlers/auth.rs

-                Err(e) => {
-                    tracing::warn!(user_id=%user.id, error=?e, "failed to generate Argon2id hash")
-                }
+                Err(e) => { tracing::warn!(user_id=%user.id, error=?e, "failed to generate Argon2id hash"); state.metrics.increment_rehash_fail(); state.metrics.inc_rehash_fail_hash(); }


While one-liners can be compact, this line is quite long and contains multiple statements, which harms readability and makes debugging more difficult. For better maintainability, please format this closure over multiple lines.

Err(e) => { tracing::warn!(user_id=%user.id, error=?e, "failed to generate Argon2id hash"); state.metrics.increment_rehash_fail(); state.metrics.inc_rehash_fail_hash(); }

gemini-code-assist · 2025-09-26T05:34:04Z

jive-api/src/metrics.rs

+        "jive_build_info{{commit=\"{}\",time=\"{}\",rustc=\"{}\",version=\"{}\"}} 1\n",
+        build_commit,
+        build_time,
+        rustc_version.replace('"', "'"),


Replacing " with ' is not a standard way to escape label values in Prometheus. The specification requires backslashes, double quotes, and newlines to be escaped with a backslash (e.g., \, \", \n). While your current approach might work if the rustc version string doesn't contain single quotes, it's safer to adhere to the standard to prevent generating invalid metrics output.

Suggested change

rustc_version.replace('"', "'"),

rustc_version.replace('\\', "\\\\").replace('"', "\\\"").replace('\n', "\\n"),

gemini-code-assist · 2025-09-26T05:34:04Z

jive-api/tests/integration/metrics_guard_ipv6_test.rs

+    #[tokio::test]
+    async fn metrics_v6_allowed_when_public() {
+        std::env::remove_var("ALLOW_PUBLIC_METRICS");
+        let dummy_pool = PgPool::connect_lazy("postgresql://ignored").unwrap_err();
+        // Skip full state since test only checks routing; create minimal state is complex, so we just assert handler builds.
+        // This test is a placeholder; full integration would need real AppState. Here we simply ensure no panic.
+        assert!(true);
+    }


This test is currently a placeholder that only asserts true and doesn't perform any actual verification. To make it useful, it should at least test that the application can be built with the metrics guard middleware and that the /metrics route is accessible when the guard is disabled. A more complete test would involve mocking ConnectInfo to test the CIDR filtering logic.

zensgit added 2 commits September 26, 2025 10:05

docs(metrics): update report, add monitoring section and consistency …

9879af6

…script

Copilot AI review requested due to automatic review settings September 26, 2025 05:31

zensgit merged commit 2e6a0dd into main Sep 26, 2025
6 of 8 checks passed

zensgit deleted the feat/security-metrics-observability branch September 26, 2025 05:32

Copilot AI reviewed Sep 26, 2025

View reviewed changes

gemini-code-assist bot reviewed Sep 26, 2025

View reviewed changes

zensgit mentioned this pull request Oct 15, 2025

core db track 1 #91

Draft

	let parsed_hash = PasswordHash::new(hash).map_err(\|e\| { #[cfg(debug_assertions)] println!("DEBUG[login]: failed to parse Argon2 hash: {:?}", e); state.metrics.increment_login_fail(); ApiError::InternalServerError })?;
	let parsed_hash = PasswordHash::new(hash).map_err(\|e\| { #[cfg(debug_assertions)] println!("DEBUG[login]: failed to parse Argon2 hash: {:?}", e); ApiError::InternalServerError })?;

	.map_err(\|_\| ApiError::Unauthorized)?;
	.map_err(\|_\| { state.metrics.increment_login_fail(); ApiError::Unauthorized })?;

	rustc_version.replace('"', "'"),
	rustc_version.replace('\\', "\\\\").replace('"', "\\\"").replace('\n', "\\n"),

feat(api): security+metrics enhancements #50

feat(api): security+metrics enhancements #50

Uh oh!

Conversation

zensgit commented Sep 26, 2025

PR Security & Metrics Summary (Template)

Overview

Key Changes

New / Modified Environment Variables

Prometheus Metrics Added

Quick Local Verification

Reviewer Checklist

Follow-up (Optional / Tracked)

Uh oh!

gemini-code-assist bot commented Sep 26, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant