Skip to content

refactor(rag): modularize main.py into routers and utilities#15

Merged
larryro merged 5 commits into
mainfrom
code-cleanup
Dec 14, 2025
Merged

refactor(rag): modularize main.py into routers and utilities#15
larryro merged 5 commits into
mainfrom
code-cleanup

Conversation

@larryro
Copy link
Copy Markdown
Collaborator

@larryro larryro commented Dec 14, 2025

Summary by CodeRabbit

Release Notes

  • New Features

    • Document management API with upload, add, batch, and delete endpoints.
    • Job status tracking endpoint for monitoring document ingestion.
    • Search functionality with query processing and relevance scoring.
    • LLM-powered content generation with source citation.
    • CORS configuration support for cross-origin requests.
  • Changes

    • Simplified LLM provider support to focus on OpenAI-compatible APIs.
    • Removed sensitive vector database configuration from public API responses.
    • Reorganized application endpoints into modular router structure.

✏️ Tip: You can customize this high-level summary in your review settings.

- Extract endpoint handlers into separate router modules (health, documents,
  search, jobs) and register them via app.include_router()
- Move cognee_service from services/cognee_service.py to services/cognee/
- Extract memory cleanup utilities to utils module
- Add vector_db_url and vector_db_collection_name settings to config
- Export routers from routers/__init__.py
- Remove obsolete test-setup.sh script
- Clean up imports and prefix unused parameters with underscore
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 14, 2025

📝 Walkthrough

Walkthrough

This pull request refactors the RAG service architecture through modularization and simplification. The changes include: (1) restructuring the monolithic Cognee service into a package with separated concerns (initialization, configuration, service logic, cleanup utilities); (2) migrating endpoint implementations from main.py to a router-based architecture (health, documents, search, jobs routers); (3) simplifying configuration by removing database pooling fields, provider-specific auth keys, and legacy multi-provider support, while adding methods for database URL validation and LLM configuration; (4) introducing application lifecycle management via an async context manager for Cognee initialization at startup; (5) adding memory cleanup instrumentation for debugging; and (6) removing the test setup bash script and simplifying Docker environment variables.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • services/rag/app/services/cognee/config.py: Comprehensive environment setup for Cognee including tiktoken patching, database validation, vector/graph configuration, and feature flag initialization. High logic density with multiple validation points and error handling paths.
  • services/rag/app/services/cognee/service.py: New CogneeService class wrapping Cognee RAG operations with lifecycle management, initialization guards, and error handling for multiple operation types (add, search, generate, delete).
  • services/rag/app/config.py: Significant removal of legacy fields and introduction of new methods (get_database_url, get_llm_config, get_allowed_origins_list) with validation logic. Changes affect the public API surface substantially.
  • services/rag/app/main.py: Structural changes including new lifespan context manager, router inclusion, exception handler refactoring, and centralized memory cleanup integration.
  • services/rag/app/routers/documents.py: Complex endpoint implementation with file handling, background task orchestration, job status tracking, and error handling across multiple operation types.
  • Interdependencies: Changes across config.py, main.py, multiple routers, and the reorganized Cognee package require understanding the initialization order and data flow between modules.

Possibly related PRs

  • tale-project/poc2#482: Modifies memory-cleanup instrumentation in services/rag/app/main.py and changes Cognee service behavior (single "tale_documents" dataset), overlapping with this PR's similar changes.
  • tale-project/poc2#497: Refactors the Cognee service module structure and environment setup, directly intersecting with this PR's reorganization of services/rag/app/services/cognee_service.py into a package structure.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 27

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
services/rag/Dockerfile (1)

47-72: Fix HEALTHCHECK to respect RAG_PORT (currently hardcoded to 8001).
If RAG_PORT is overridden, the container can be marked unhealthy despite serving correctly.

-HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
-    CMD curl -f http://localhost:8001/health || exit 1
+HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
+    CMD sh -c 'curl -fsS "http://localhost:${RAG_PORT:-8001}/health" || exit 1'

Also consider pinning/verifying the uv install script (Line 17) to reduce supply-chain risk (e.g., versioned URL + checksum).

services/rag/app/config.py (2)

144-148: get_allowed_origins_list() should drop empty entries.
Trailing commas will produce "" origins.

-        return [origin.strip() for origin in self.allowed_origins.split(",")]
+        return [o for o in (origin.strip() for origin in self.allowed_origins.split(",")) if o]

85-142: get_llm_config() incorrectly omits base_url when no API key is set, breaking Ollama and other local OpenAI-compatible setups.

The base_url is only added to the config dict when api_key is truthy (lines 138–141). Local providers like Ollama that use OPENAI_BASE_URL without an API key will not receive the base URL configuration, causing requests to route to the wrong endpoint.

Additionally, lines 103–104 and 110–111 call int(max_tokens_env) and float(temperature_env) without error handling. Invalid environment variable values will raise ValueError at startup with an unclear message; wrap these conversions in try/except and raise a descriptive ValueError like "OPENAI_MAX_TOKENS must be a valid integer".

-        if api_key:
-            config["api_key"] = api_key
-            config["base_url"] = base_url
+        if api_key:
+            config["api_key"] = api_key
+
+        # Base URL may be required even without an API key (e.g., Ollama/local gateways)
+        if base_url:
+            config["base_url"] = base_url

And guard the conversions:

if max_tokens_env is not None:
    try:
        max_tokens = int(max_tokens_env)
    except ValueError:
        raise ValueError("OPENAI_MAX_TOKENS must be a valid integer")
        
if temperature_env is not None:
    try:
        temperature = float(temperature_env)
    except ValueError:
        raise ValueError("OPENAI_TEMPERATURE must be a valid float")
services/rag/app/main.py (3)

26-42: Use logger.exception(...) (or logger.opt(exception=e)) for startup failures.
In lifespan(), logger.error(f"...{e}") loses the stack trace; this makes init failures hard to debug.


72-83: Ensure memory cleanup runs even when request handling raises.
cleanup_memory(...) won’t run if call_next raises. Wrap in try/finally and make cleanup best-effort.

 @app.middleware("http")
 async def memory_cleanup_middleware(request: Request, call_next):
     ...
-    response = await call_next(request)
-    cleanup_memory(context=f"after request {request.url.path}")
-    return response
+    try:
+        return await call_next(request)
+    finally:
+        try:
+            cleanup_memory(context=f"after request {request.url.path}")
+        except Exception:
+            logger.exception("cleanup_memory failed")

98-109: loguru doesn’t honor exc_info=True; fix stacktrace logging + debug check.
Also, settings.log_level == "debug" is case-sensitive; if config is "DEBUG" you’ll never include details.

 @app.exception_handler(Exception)
 async def general_exception_handler(_request, exc):
     """Handle general exceptions."""
-    logger.error(f"Unhandled exception: {exc}", exc_info=True)
+    logger.opt(exception=exc).error("Unhandled exception")
     return JSONResponse(
         status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
         content=ErrorResponse(
             error=exc.__class__.__name__,
             message="Internal server error",
-            details={"error": str(exc)} if settings.log_level == "debug" else None,
+            details={"error": str(exc)} if settings.log_level.lower() == "debug" else None,
         ).model_dump(),
     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro (Legacy)

📥 Commits

Reviewing files that changed from the base of the PR and between a8107f5 and fb87892.

📒 Files selected for processing (17)
  • services/rag/Dockerfile (1 hunks)
  • services/rag/app/config.py (3 hunks)
  • services/rag/app/main.py (4 hunks)
  • services/rag/app/models.py (0 hunks)
  • services/rag/app/routers/__init__.py (1 hunks)
  • services/rag/app/routers/documents.py (1 hunks)
  • services/rag/app/routers/health.py (1 hunks)
  • services/rag/app/routers/jobs.py (1 hunks)
  • services/rag/app/routers/search.py (1 hunks)
  • services/rag/app/services/cognee/__init__.py (1 hunks)
  • services/rag/app/services/cognee/cleanup.py (1 hunks)
  • services/rag/app/services/cognee/config.py (1 hunks)
  • services/rag/app/services/cognee/service.py (1 hunks)
  • services/rag/app/services/cognee/utils.py (1 hunks)
  • services/rag/app/services/cognee_service.py (0 hunks)
  • services/rag/app/utils.py (1 hunks)
  • services/rag/test-setup.sh (0 hunks)
💤 Files with no reviewable changes (3)
  • services/rag/test-setup.sh
  • services/rag/app/models.py
  • services/rag/app/services/cognee_service.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*

📄 CodeRabbit inference engine (.cursor/rules/workspace_rules.mdc)

Use English only for ALL user-facing content including UI components, labels, buttons, dialogs, forms, toast messages, error messages, success messages, comments, documentation, README files, variable names, function names, and type names

Files:

  • services/rag/app/routers/__init__.py
  • services/rag/app/routers/health.py
  • services/rag/app/services/cognee/cleanup.py
  • services/rag/Dockerfile
  • services/rag/app/routers/search.py
  • services/rag/app/routers/documents.py
  • services/rag/app/utils.py
  • services/rag/app/services/cognee/utils.py
  • services/rag/app/services/cognee/__init__.py
  • services/rag/app/services/cognee/service.py
  • services/rag/app/services/cognee/config.py
  • services/rag/app/main.py
  • services/rag/app/routers/jobs.py
  • services/rag/app/config.py
🧬 Code graph analysis (6)
services/rag/app/routers/__init__.py (3)
services/rag/app/routers/health.py (1)
  • health (30-36)
services/rag/app/routers/search.py (1)
  • search (21-58)
services/rag/app/services/cognee/service.py (1)
  • search (108-167)
services/rag/app/routers/search.py (2)
services/rag/app/models.py (4)
  • QueryRequest (150-168)
  • QueryResponse (185-191)
  • GenerateRequest (198-216)
  • GenerateResponse (219-225)
services/rag/app/services/cognee/service.py (2)
  • search (108-167)
  • generate (169-281)
services/rag/app/routers/documents.py (3)
services/rag/app/models.py (6)
  • DocumentAddRequest (43-57)
  • DocumentAddResponse (60-84)
  • DocumentDeleteRequest (87-90)
  • DocumentDeleteResponse (93-97)
  • BatchAddRequest (232-237)
  • BatchAddResponse (240-246)
services/rag/app/utils.py (1)
  • cleanup_memory (33-43)
services/rag/app/services/cognee/service.py (1)
  • add_document (57-106)
services/rag/app/services/cognee/service.py (2)
services/rag/app/services/cognee/utils.py (2)
  • normalize_add_result (12-53)
  • normalize_search_results (96-105)
services/rag/app/config.py (1)
  • get_llm_config (85-142)
services/rag/app/services/cognee/config.py (1)
services/rag/app/config.py (2)
  • get_llm_config (85-142)
  • get_database_url (79-83)
services/rag/app/routers/jobs.py (1)
services/rag/app/models.py (1)
  • JobStatus (114-143)
🪛 Ruff (0.14.8)
services/rag/app/routers/__init__.py

8-8: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

services/rag/app/routers/health.py

3-3: typing.Dict is deprecated, use dict instead

(UP035)

services/rag/app/services/cognee/cleanup.py

55-55: Do not catch blind exception: Exception

(BLE001)


128-128: Do not catch blind exception: Exception

(BLE001)

services/rag/app/routers/search.py

53-53: Do not catch blind exception: Exception

(BLE001)


55-58: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


57-57: Use explicit conversion flag

Replace with conversion flag

(RUF010)


91-91: Do not catch blind exception: Exception

(BLE001)


93-96: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


95-95: Use explicit conversion flag

Replace with conversion flag

(RUF010)

services/rag/app/routers/documents.py

5-5: typing.Dict is deprecated, use dict instead

(UP035)


34-34: Async functions should not open files with blocking methods like open

(ASYNC230)


96-96: Do not catch blind exception: Exception

(BLE001)


127-127: Do not perform function call File in argument defaults; instead, perform the call within the function, or read the default from a module-level singleton variable

(B008)


151-154: Abstract raise to an inner function

(TRY301)


161-161: Async functions should not open files with blocking methods like open

(ASYNC230)


170-173: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


211-211: Do not catch blind exception: Exception

(BLE001)


226-226: Do not catch blind exception: Exception

(BLE001)


257-257: Do not catch blind exception: Exception

(BLE001)


259-262: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


261-261: Use explicit conversion flag

Replace with conversion flag

(RUF010)


276-276: Do not catch blind exception: Exception

(BLE001)


278-281: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


280-280: Use explicit conversion flag

Replace with conversion flag

(RUF010)


307-307: Do not catch blind exception: Exception

(BLE001)


314-314: Use explicit conversion flag

Replace with conversion flag

(RUF010)

services/rag/app/utils.py

29-29: Do not catch blind exception: Exception

(BLE001)

services/rag/app/services/cognee/utils.py

7-7: typing.Dict is deprecated, use dict instead

(UP035)


7-7: typing.List is deprecated, use list instead

(UP035)


13-13: Dynamically typed expressions (typing.Any) are disallowed in result

(ANN401)


48-48: Do not catch blind exception: Exception

(BLE001)


50-50: Do not catch blind exception: Exception

(BLE001)


56-56: Dynamically typed expressions (typing.Any) are disallowed in result

(ANN401)


85-85: Do not catch blind exception: Exception

(BLE001)

services/rag/app/services/cognee/__init__.py

21-21: __all__ is not sorted

Apply an isort-style sorting to __all__

(RUF022)

services/rag/app/services/cognee/service.py

8-8: typing.Dict is deprecated, use dict instead

(UP035)


8-8: typing.List is deprecated, use list instead

(UP035)


60-60: Unused method argument: metadata

(ARG002)


97-102: Consider moving this statement to an else block

(TRY300)


113-113: Unused method argument: filters

(ARG002)


154-154: Consider moving this statement to an else block

(TRY300)


258-263: Consider moving this statement to an else block

(TRY300)


300-303: Consider moving this statement to an else block

(TRY300)

services/rag/app/services/cognee/config.py

30-30: Dynamically typed expressions (typing.Any) are disallowed in *args

(ANN401)


30-30: Dynamically typed expressions (typing.Any) are disallowed in **kwargs

(ANN401)


30-30: Dynamically typed expressions (typing.Any) are disallowed in _safe_encoding_for_model

(ANN401)


33-33: Do not catch blind exception: Exception

(BLE001)


61-61: Avoid specifying long messages outside the exception class

(TRY003)


139-143: Avoid specifying long messages outside the exception class

(TRY003)


147-147: Avoid specifying long messages outside the exception class

(TRY003)


149-149: Avoid specifying long messages outside the exception class

(TRY003)


151-151: Avoid specifying long messages outside the exception class

(TRY003)


153-153: Avoid specifying long messages outside the exception class

(TRY003)


228-228: Do not catch blind exception: Exception

(BLE001)


247-247: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)


249-249: Consider moving this statement to an else block

(TRY300)

services/rag/app/config.py

59-59: Probable insecure usage of temporary file or directory: "/tmp/tale-rag-data"

(S108)

🔇 Additional comments (11)
services/rag/app/services/cognee/utils.py (1)

96-105: normalize_search_results() is fine.

services/rag/app/routers/health.py (1)

39-55: Double-check /config is safe to expose in your deployment model.
It includes runtime knobs (and model names). If this service is internet-facing, consider gating behind auth or restricting to internal networks.
Based on coding guidelines, keep all exposed error/config strings in English (looks good here).

services/rag/app/routers/documents.py (1)

65-123: Queued ingestion flow looks good; ensure job_id/document_id semantics are intentional.
Using a stable doc_id and immediately creating a queued job record is a good UX for callers polling status.

services/rag/app/services/cognee/config.py (2)

206-230: Fail-fast vs continue: decide if base_config preconfiguration is required.
configure_cognee_base_config() swallows all exceptions; if cognee requires these directories, you may want to re-raise to avoid a partially configured runtime.


232-253: Handle non-ImportError failures during import cognee intentionally.
If cognee raises at import time for reasons other than ImportError (bad env, incompatible deps), you currently crash the process. Confirm whether that’s desired; otherwise catch broader import-time exceptions and return False with a logged stack trace.

services/rag/app/services/cognee/service.py (6)

18-22: LGTM!

Good design decision to use a single shared dataset. The comment clearly explains the rationale for avoiding per-document datasets to keep memory usage bounded.


25-30: LGTM!

Clean initialization with a simple state flag.


32-55: LGTM!

Good initialization workflow with proper cleanup of legacy data and error handling.


76-106: LGTM!

Good use of incremental loading to avoid reprocessing the entire dataset. The timing instrumentation and result normalization are well implemented.


309-321: LGTM!

Clean implementation with proper error handling for the reset operation.


156-167: Error handling is appropriate given Cognee's API constraints.

The string-based check for "DatabaseNotCreatedError" is necessary because this exception is not exposed as a public, importable type in Cognee's API (v0.4.0). The defensive error message handling is pragmatic and correct.

Comment thread services/rag/app/config.py Outdated
Comment thread services/rag/app/routers/__init__.py Outdated
Comment thread services/rag/app/routers/documents.py
Comment thread services/rag/app/routers/documents.py
Comment thread services/rag/app/routers/documents.py
Comment thread services/rag/app/services/cognee/service.py
Comment thread services/rag/app/services/cognee/service.py Outdated
Comment thread services/rag/app/services/cognee/utils.py
Comment thread services/rag/app/services/cognee/utils.py
Comment thread services/rag/app/utils.py
- config.py: Change EMBEDDING_PROVIDER and EMBEDDING_MODEL to use
  os.environ.setdefault() for consistency with other env vars
- config.py: Fix all logger calls to use brace-style {} formatting
  instead of percent-style %s
- cleanup.py: Fix remaining logger call to use brace-style formatting
@larryro
Copy link
Copy Markdown
Collaborator Author

larryro commented Dec 14, 2025

Additional Fixes Pushed (Commit 0b716de)

Addressed remaining CodeRabbit review feedback:

config.py

  • Environment variables: Changed EMBEDDING_PROVIDER and EMBEDDING_MODEL to use os.environ.setdefault() for consistency with other env vars (allows operator overrides)
  • Logger formatting: Fixed all logger calls to use brace-style {} formatting instead of percent-style %s (loguru best practice)

cleanup.py

  • Logger formatting: Fixed remaining logger call at line 121 to use brace-style {} formatting

All 27 CodeRabbit review comments have been addressed. The changes are now complete and ready for final review.

When the cognee package is not installed, provide a _UnavailableCogneeService
placeholder class instead of None. This allows:
- cognee_service.initialized to return False (no AttributeError)
- Health endpoint to report 'degraded' status safely
- Startup to call initialize() without crashing
- All service methods to return appropriate failure responses

Addresses CodeRabbit review feedback on cognee/__init__.py.
@larryro larryro merged commit d1dbc84 into main Dec 14, 2025
1 check passed
@larryro larryro deleted the code-cleanup branch December 14, 2025 08:41
@coderabbitai coderabbitai Bot mentioned this pull request Feb 8, 2026
larryro added a commit that referenced this pull request May 17, 2026
Closes #15, #16, #17, #18, #41 — secret sanitiser + audio-fetch headers.

- `sanitize_secrets.ts` JSON-value class now honours backslash-escaped
  characters: `(?:[^"\\]|\\.)*` instead of `[^"]*`. Previously a value
  containing `\"` (an escaped quote inside the string) terminated the
  redaction at the first inner quote and left the rest of the JSON
  object exposed. Test added that pins the new behaviour.
- `sanitize_secrets.ts` patterns extended with Stripe (`sk_/pk_/rk_` ×
  `live/test`), OpenAI org/project identifiers (`org-…`, `proj_…`),
  Cookie / Set-Cookie header lines, and bare Better Auth session
  tokens. An idempotency check (sanitize ∘ sanitize = sanitize) and a
  100k-char ReDoS guard round out the new coverage.
- `http.ts /api/tts-audio` moves the per-IP rate limit ABOVE the Better
  Auth session lookup. An unauthenticated attacker hammering the route
  was previously forcing a DB session-query per request — mirrors the
  `/storage` route's order.
- `http.ts /api/tts-audio` Cache-Control changes from
  `private, max-age=600` to `private, max-age=0, must-revalidate`. A
  removed member can no longer replay cached audio for up to 10 minutes
  past their removal; the revalidating round-trip re-hits the
  membership gate on every replay.
- `http.ts /api/tts-audio` sets `Accept-Ranges: none` so iOS Safari
  `<audio preload="auto">` range probes don't trigger audible mid-chunk
  restarts.
larryro added a commit that referenced this pull request May 17, 2026
Closes #15, #16, #17, #18, #41 — secret sanitiser + audio-fetch headers.

- `sanitize_secrets.ts` JSON-value class now honours backslash-escaped
  characters: `(?:[^"\\]|\\.)*` instead of `[^"]*`. Previously a value
  containing `\"` (an escaped quote inside the string) terminated the
  redaction at the first inner quote and left the rest of the JSON
  object exposed. Test added that pins the new behaviour.
- `sanitize_secrets.ts` patterns extended with Stripe (`sk_/pk_/rk_` ×
  `live/test`), OpenAI org/project identifiers (`org-…`, `proj_…`),
  Cookie / Set-Cookie header lines, and bare Better Auth session
  tokens. An idempotency check (sanitize ∘ sanitize = sanitize) and a
  100k-char ReDoS guard round out the new coverage.
- `http.ts /api/tts-audio` moves the per-IP rate limit ABOVE the Better
  Auth session lookup. An unauthenticated attacker hammering the route
  was previously forcing a DB session-query per request — mirrors the
  `/storage` route's order.
- `http.ts /api/tts-audio` Cache-Control changes from
  `private, max-age=600` to `private, max-age=0, must-revalidate`. A
  removed member can no longer replay cached audio for up to 10 minutes
  past their removal; the revalidating round-trip re-hits the
  membership gate on every replay.
- `http.ts /api/tts-audio` sets `Accept-Ranges: none` so iOS Safari
  `<audio preload="auto">` range probes don't trigger audible mid-chunk
  restarts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant