feat(platform): budget limits per user/group by larryro · Pull Request #1311 · tale-project/tale

larryro · 2026-04-10T10:53:39Z

Summary

Budget enforcement: Wire the existing budget system into the agent chat pipeline so admins can configure token budgets, cost limits, and request caps per user, team, role, or org-wide defaults
Per-message cost tracking: Store costEstimateCents in message metadata, with per-model pricing sourced from provider config files (fallback to hardcoded map)
Tool usage tracking: RAG search and web tools now return structured usage data (embedding/vision tokens) that feeds into the cost ledger
Budget editor UI: Table + dialog interface with searchable user/team pickers, auto-save on changes, and helpful placeholder hints (e.g. "1M tokens ≈ 750K words")
Usage dashboard: Summary stat cards (total tokens, cost, requests, active users) above per-user breakdown table with full precision for small cost values
UX polish: Budget exceeded shows as an assistant message instead of error toast, thread generation status resets to idle correctly

Test plan

Summary by CodeRabbit

New Features
- Added budget enforcement and cost tracking across AI operations (chats, agents, web scraping, RAG searches).
- Introduced organization-level usage dashboard showing token counts, costs, and request metrics by period.
- Added budget configuration management interface with role-based access control.
- Extended message metadata to display cost estimates in chat UI.
Enhancements
- Model pricing now configurable per provider with cost estimation for budget tracking.
- Token usage metadata now tracked and propagated across embedding, LLM, and web-extraction services.
- Budget validation performed before agent generation to prevent overspend.

…1166) Wire the existing (but previously unused) budget enforcement system into the platform. This enables admins to configure token budgets, cost limits, and request caps per user, team, role, or as org-wide defaults. Key changes: - Security: add admin role checks to governance mutations and usage queries - Fix getModelCost substring collision (gpt-4o-mini matched gpt-4o) - Remove dead estimatedCostEur/estimatedCostUsd schema fields - Make buildPeriodKey deterministic (accept timestamp, no Date() in mutations) - Zero out usage for cached LLM responses to prevent double-counting - Embedding service: add embed_texts_with_usage() capturing response.usage - RAG service: return embedding + LLM token usage in search/generate responses - Crawler service: pass UsageAccumulator to all Vision API calls - Add cost field to provider JSON schema, resolve at model resolution time - Wire incrementUsageLedger into onAgentComplete (per-team ledger rows) - Rewrite checkBudget: independent rule checking, team shared caps, structured errors - Add budget checks to startAgentChat, approval/human_input/location resume paths - Add usage tracking to OpenAI-compat client tool mode - Governance settings page with tabs: System Prompt, Budgets, Usage - Budget editor UI with scope/period/limits configuration - Usage dashboard with per-user token/cost/request breakdown

…s in budget editor Use SearchableSelect with useMembers/useOrgTeams hooks so admins can search by name/email instead of manually entering IDs.

…fields - Max Tokens: placeholder "e.g. 1000000", hint "1M tokens ≈ 750K words" - Max Cost: changed from cents to USD input with auto-conversion, hint "GPT-4o ≈ $10/1M tokens" - Max Requests: placeholder "e.g. 500", hint about monthly quota

Rules displayed in a clean table with scope, target, period, limits. Add/edit via FormDialog with searchable user/team pickers and helpful placeholder hints. More consistent with other settings pages.

…ull-width table - Toggle, add/edit rule, and delete all auto-save immediately - Remove page-level Save button (no longer needed) - Remove max-w-4xl constraint so table fills available width

…age table Show Total Tokens, Total Cost, Total Requests, Active Users as compact stat cards between the heading and the per-user table.

…rovider cost to usage tracking - Add Cost column to models table in AI Providers settings (input/output USD per 1M tokens) - Add cost input fields to model add/edit dialog with helpful placeholders - Thread providerCost from resolved model data through generateAgentResponse to onAgentComplete - estimateCostCents now uses provider cost when available, falls back to hardcoded map - Revert DEFAULT_COST to mid-range (200/800) since provider config is the intended source of truth

…o cost tracking - Add cost data to all 18 models in openrouter.json (from OpenRouter pricing) - Add Cost column and input fields to AI Providers models table/dialog - Thread providerCost from model resolution through to onAgentComplete - estimateCostCents uses provider cost when available, hardcoded map as fallback - Revert DEFAULT_COST to mid-range since provider config is the source of truth Pricing sources: openrouter.ai model pages (April 2026)

…ssage cost tracking - Add costEstimateCents field to messageMetadata schema - Compute cost once in onAgentComplete, use for both metadata save and ledger increment - Enables per-message cost display in message info dialog

…hboard Use toPrecision(3) for costs < $1 to preserve significant digits. e.g. $0.0893 instead of $0.09, $0.000682 instead of $0.00. Costs >= $1 still show 2 decimal places.

The query validator was missing the new field, causing ReturnsValidationError when reading metadata for messages that have cost data.

Display cost with full precision (toPrecision(3) for small values) in the message information popup, between Performance and Tool Calls.

…f separate field

The field was returned by the query but not mapped in the hook's explicit field selection, so it was always undefined in the UI.

… places in cents Round to 4 decimal places instead of 2 to avoid losing significant digits on small costs. e.g. 0.40275 cents now stored as 0.4028 instead of 0.40, matching OpenRouter's displayed cost more closely.

RAG search and web tools now return { success, response, usage, model } objects instead of plain strings. This allows extractToolCallsFromSteps to capture embedding/vision token usage in toolsUsage, which then gets recorded in the usage ledger via onAgentComplete. - rag_search_tool: includes embedding token count and model from RAG service - web_tool: includes vision/LLM token usage from crawler service - WebFetchUrlResult/WebFetchExtractApiResponse: add usage field - fetch_and_extract: pass through usage from crawler response

…rror toast When budget is exceeded, save the limit message as an assistant reply in the thread instead of throwing an error that causes "Failed to send message". The user message is preserved and they see a clear explanation of the budget limit in the conversation.

…xceeded The thread was set to 'generating' before the budget check, causing a permanent "Thinking..." state when the budget message was saved. Now properly resets to 'idle' so the UI shows the budget message and allows further interaction.

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

…toolsUsage - Add costEstimateCents to toolUsageItemValidator schema - Compute cost in extractToolCallsFromSteps using estimateCostCents - Add costEstimateCents to frontend ToolUsage interface - Display cost in ToolCallCard in message info dialog - Now rag_search, web, and delegation tools show their individual cost

…get editor Use type guard functions (isScopeValue, isPeriodValue) instead of unsafe `as` casts in Select onValueChange handlers to satisfy no-unsafe-type-assertion lint rule.

… definition The schema.ts had inline tool usage object definitions separate from the validators.ts toolUsageItemValidator. The new costEstimateCents field was added to the validator but not the schema, causing Convex to reject documents with the extra field at write time.

The search service now calls embed_query_with_usage instead of embed_query. Update the test helper and assertion to match.

coderabbitai · 2026-04-10T11:15:28Z

📝 Walkthrough

Walkthrough

This pull request introduces comprehensive token usage and cost tracking infrastructure across multiple services. It adds usage dataclasses and result types to the embedding service, extends model definitions with cost metadata (input/output cents per million tokens), implements budget enforcement checks before agent generation, tracks aggregated usage in a ledger indexed by organization/user/period, updates governance UIs to display costs and usage dashboards, and propagates usage information through generation, extraction, and search pipelines. The changes touch embedding services, crawler extraction flows, RAG search/generation, agent completion handlers, governance enforcement, and chat platform UI components.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

feat(platform): add AI image generation integration with approval flow #1114: Directly modifies triggerCompletionResponseHandler in approval_shared.ts to add budget enforcement checks before agent scheduling, which overlaps with the same completion flow modifications in this PR.
feat(platform): structured AI responses with section markers and timeout budget chain #496: Both PRs modify generateAgentResponse and internal_actions to pass request-scoped metadata (providerCost, budget context) and integrate budget/cost tracking into the agent response pipeline.
feat: migrate website search and embeddings to crawler service #595: Both PRs implement embedding and usage tracking propagation through the crawler/embedding pipeline, with the main PR building on the embedding/usage foundation established in the retrieved PR.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(platform): budget limits per user/group' clearly and concisely summarizes the main feature addition—budget enforcement system with per-user and per-group scoping—which is the primary focus of this substantial changeset across multiple services.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/budget-limits-per-user-group

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 21

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (8)

services/crawler/app/routers/web.py (1)

123-132: ⚠️ Potential issue | 🟠 Major

Preserve parser usage when a file parse fails.

The success path forwards result.get("usage"), but the failure return drops it. If parse_file_with_vision() spends OCR/LLM tokens and then returns success=False, the budget ledger will record zero cost for a non-zero spend.

Suggested fix

     if not result.get("success"):
         return WebFetchExtractResponse(
             success=False,
             url=url_str,
             content="",
             content_type=actual_ct or content_type or "",
             word_count=0,
             page_count=0,
             error=result.get("error", "Failed to extract content from file"),
+            usage=result.get("usage"),
         )

Also applies to: 143-152

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/crawler/app/routers/web.py` around lines 123 - 132, The failure
branches that return a WebFetchExtractResponse omit the parser usage, causing
billing/ledger underreporting; update the negative-return paths (the one inside
the block that checks if not result.get("success") and the analogous failure
block later) to include usage=result.get("usage", {}) so the response preserves
parse_file_with_vision() usage info; locate the WebFetchExtractResponse
constructors in this router and add the usage field populated from
result.get("usage", {}) to each failure return.

services/platform/convex/agent_tools/rag/rag_search_tool.ts (1)

151-156: ⚠️ Potential issue | 🟠 Major

Normalize list_indexed to the structured tool response shape.

This branch still returns AgentIndexedDocumentListResult directly, while the search branch now returns { success, response, output, usage? }. Any caller consuming the new structured shape will handle list_indexed differently or drop it entirely.

Suggested fix

       if (args.operation === 'list_indexed') {
-        return listIndexedDocuments(ctx, {
+        const result = await listIndexedDocuments(ctx, {
           limit: args.limit,
           cursor: args.cursor,
         });
+        return {
+          success: true,
+          response: JSON.stringify(result, null, 2),
+          output: result,
+        };
       }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/rag/rag_search_tool.ts` around lines 151
- 156, The 'list_indexed' branch in the execute function returns an
AgentIndexedDocumentListResult directly rather than the standardized tool
response shape; update the execute handler so when args.operation ===
'list_indexed' it calls listIndexedDocuments(ctx, {...}) and then returns an
object matching the search branch shape (e.g. { success: true, response: <the
list result>, output: <optional human-readable summary>, usage: <optional usage
info> }) instead of the raw AgentIndexedDocumentListResult so callers receive
the consistent { success, response, output, usage? } structure.

services/platform/convex/agent_tools/web/web_tool.ts (1)

77-84: ⚠️ Potential issue | 🟠 Major

Forward the extraction instruction for webpage and inline-URL fetches.

This condition only passes instruction for explicit file URLs. Fetches like query: "Extract pricing from https://example.com/pricing" or explicit HTML URL fetches will ignore the user's prompt and fall back to raw extraction, even though the crawler supports instruction for those paths too.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/web/web_tool.ts` around lines 77 - 84,
The code only sets instruction for file URLs; update the instruction calculation
so user-provided extraction prompts (args.query) are forwarded for webpage and
inline-URL fetches as well: compute instruction = args.query when present (not
just when isFileUrl(targetUrl)) and pass that instruction into the
fetchAndExtract call (function fetchAndExtract in this file), preserving any
existing file-URL logic that may still apply.

services/platform/convex/agent_tools/approval_shared.ts (1)

59-90: ⚠️ Potential issue | 🟠 Major

Handle budget denial as a normal return, not a thrown mutation error.

This helper is running inside a mutation path. Throwing on !budgetResult.allowed rolls back the completion message and bubbles an error to the caller, which regresses the “assistant message, not error” budget UX. Check the budget before createStream()/generationStatus, then save the budget notice and return.

💡 Suggested fix

-  const streamId = await persistentStreaming.createStream(ctx);
-
-  if (threadMeta) {
-    await ctx.db.patch(threadMeta._id, {
-      generationStatus: 'generating' as const,
-      streamId,
-      updatedAt: Date.now(),
-    });
-  }
-
   // Budget enforcement — block before scheduling if limits are exceeded
   const userId = thread?.userId;
   if (userId) {
     const budgetResult = await checkBudget(
       ctx,
@@
       userId,
       agentConfig.agentTeamId ? [agentConfig.agentTeamId] : [],
     );
     if (!budgetResult.allowed) {
-      throw new Error(budgetResult.reason ?? 'Budget limit exceeded');
+      await saveMessage(ctx, components.agent, {
+        threadId,
+        message: {
+          role: 'assistant',
+          content: budgetResult.reason ?? 'Budget limit exceeded',
+        },
+      });
+      return;
     }
   }
+
+  const streamId = await persistentStreaming.createStream(ctx);
+
+  if (threadMeta) {
+    await ctx.db.patch(threadMeta._id, {
+      generationStatus: 'generating' as const,
+      streamId,
+      updatedAt: Date.now(),
+    });
+  }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/approval_shared.ts` around lines 59 -
90, The budget check currently throws an error after creating the stream and
setting generationStatus, causing a rollback; move the call to checkBudget (the
check of checkBudget(...) result for userId) to before
persistentStreaming.createStream() and before patching
threadMeta.generationStatus, and when budgetResult.allowed is false do NOT
throw—use saveMessage(...) to save a user-visible budget denial/notice message
(similar to how promptMessageId is created) and return early so no stream is
created and no generationStatus is set; reference
persistentStreaming.createStream, saveMessage, checkBudget, threadMeta and
generationStatus when making the change.

services/platform/convex/lib/agent_response/generate_response.ts (1)

1807-1834: ⚠️ Potential issue | 🟠 Major

Propagate providerCost on the failure metadata path too.

Success and cancel completions now use resolved provider pricing, but the error path still omits it. Any failed generation that reports token usage will fall back to the hardcoded model map here, so costEstimateCents and ledger totals diverge from the success path.

💡 Suggested fix

         await onAgentComplete(ctx, {
           threadId,
           agentType,
           result: {
             threadId,
@@
           organizationId,
           userId,
           teamIds,
           agentSlug,
+          providerCost,
         });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/lib/agent_response/generate_response.ts` around
lines 1807 - 1834, The error completion path in generate_response.ts calls
onAgentComplete with a result object that omits providerCost, causing divergence
from success/cancel paths; update the result passed to onAgentComplete (the
object constructed in the await onAgentComplete(...) call) to include
providerCost (and/or costEstimateCents if you use that symbol) populated the
same way as the success/cancel completion paths so failed generations that have
token usage use the resolved provider pricing; locate the onAgentComplete
invocation in generate_response.ts and add providerCost: providerCost (or the
exact variable name used where resolved pricing is computed) to the result
object along with any existing usage/duration fields.

services/platform/convex/agent_tools/human_input/mutations.ts (1)

169-205: ⚠️ Potential issue | 🟠 Major

Don’t throw on budget denial after marking the approval complete.

This runs inside the same mutation after the approval has already been patched to completed. Throwing here aborts the whole transaction, so the human response is lost and the caller gets an error instead of the intended assistant budget-exceeded message. The budget gate needs to run before createStream()/generationStatus, then save the budget notice and return cleanly.

💡 Suggested fix

-  const streamId = await persistentStreaming.createStream(ctx);
-
-  // Set generationStatus so the frontend shows loading indicator
-  const threadMeta = await ctx.db
-    .query('threadMetadata')
-    .withIndex('by_threadId', (q) => q.eq('threadId', threadId))
-    .first();
-  if (threadMeta) {
-    await ctx.db.patch(threadMeta._id, {
-      generationStatus: 'generating' as const,
-      streamId,
-      updatedAt: Date.now(),
-    });
-  }
-
   const thread = await ctx.runQuery(components.agent.threads.getThread, {
     threadId,
   });
@@
   if (userId) {
     const budgetResult = await checkBudget(
       ctx,
       organizationId,
       userId,
       agentConfig.agentTeamId ? [agentConfig.agentTeamId] : [],
     );
     if (!budgetResult.allowed) {
-      throw new Error(budgetResult.reason ?? 'Budget limit exceeded');
+      await saveMessage(ctx, components.agent, {
+        threadId,
+        message: {
+          role: 'assistant',
+          content: budgetResult.reason ?? 'Budget limit exceeded',
+        },
+      });
+      return { success: true, threadId };
     }
   }
+
+  const streamId = await persistentStreaming.createStream(ctx);
+
+  // Set generationStatus so the frontend shows loading indicator
+  const threadMeta = await ctx.db
+    .query('threadMetadata')
+    .withIndex('by_threadId', (q) => q.eq('threadId', threadId))
+    .first();
+  if (threadMeta) {
+    await ctx.db.patch(threadMeta._id, {
+      generationStatus: 'generating' as const,
+      streamId,
+      updatedAt: Date.now(),
+    });
+  }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/agent_tools/human_input/mutations.ts` around lines
169 - 205, The budget check is happening after createStream()/setting
generationStatus so throwing on denial aborts the transaction and loses the
completed approval; move the budget enforcement (the checkBudget call and its
handling) to run before calling persistentStreaming.createStream and before
patching threadMetadata.generationStatus/streamId, and if not allowed do not
throw—persist a budget-notice (e.g. save a record or patch the thread/response
with a budgetExceeded flag/message) and return cleanly with the assistant
budget-exceeded response instead of raising an error; update references to
checkBudget, persistentStreaming.createStream, and the threadMetadata patch to
reflect the new order.

services/platform/convex/lib/agent_chat/start_agent_chat.ts (1)

132-154: ⚠️ Potential issue | 🟠 Major

Budget denials still leave an orphan generation session behind.

createStream() and the threadMetadata transition to generating happen before the budget check. When the check fails, we only flip generationStatus back to idle, so the thread still points at a stream/generation start that will never produce output. That can leave the client subscribed to a dead stream and pollute thread state.

Either move the budget check ahead of stream creation, or explicitly clear the stream-related fields on the deny path.

Also applies to: 217-240
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@services/platform/convex/lib/agent_chat/start_agent_chat.ts` around lines 132
- 154, The code creates a persistent stream via persistentStreaming.createStream
and patches threadMetadata.generationStatus to "generating" before performing
the budget check, which leaves an orphaned stream on denial; fix by moving the
budget validation to run before calling persistentStreaming.createStream and
before setting threadMetadata to generating, or if you prefer to keep the
current order, ensure the deny path explicitly clears the stream-related fields
(e.g., set streamId = undefined, generationStartTime = undefined, and reset any
stream/generation markers) by calling ctx.db.patch on the same threadMetadata
record whenever budget is denied; apply the same change to the other occurrence
that updates threadMetadata and creates streams (the block around the second
createStream/patch).

services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx (1)

523-603: ⚠️ Potential issue | 🟠 Major

Add a <caption> to the models table.

This table still has no caption, so screen readers get the columns without any summary of what the table represents.

As per coding guidelines "ENSURE data tables have a <caption> (can be visually hidden with sr-only)".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@services/platform/app/routes/dashboard/`$id/settings/providers/$providerName.tsx
around lines 523 - 603, The models table rendered by the Table component (the
block starting with <Table> and the config.models.map loop) lacks a caption for
screen readers; add a <caption> as the first child of <Table> (just below the
opening <Table>) with a descriptive string (e.g., use the i18n helper
t('providers.modelsTableCaption') or a literal like "Provider models and
settings") and make it visually hidden with the sr-only class if you don't want
it visible; ensure the caption text clearly summarizes the table's purpose so
screen readers receive context for the columns.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/providers/openrouter.json`:
- Around line 16-137: The example contains non-existent/unverified OpenRouter
model entries (e.g. ids "openai/gpt-5.2", "openai/gpt-5.2-pro",
"anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6",
"google/gemini-3.1-pro-preview", "deepseek/deepseek-v3.2",
"qwen/qwen3-next-80b-a3b-instruct", "meta-llama/llama-4-maverick", etc.) with
hardcoded prices; either replace each of these model objects with actual
verified OpenRouter model ids and current pricing from the OpenRouter catalog,
or mark the entire file as a template by adding a clear machine-readable flag
(e.g. add a top-level "template": true or add "verified": false to each model
object) and update displayName/description to indicate placeholder pricing so
production budget enforcement won’t use these values.

In `@services/platform/app/features/chat/components/message-info-dialog.tsx`:
- Around line 232-248: The "Cost" label is hardcoded in the metadata stat block
(the object created when metadata?.costEstimateCents != null) and must be
localized; replace the literal 'Cost' with the translation call from the
existing hook (e.g., useT or t) used elsewhere in this component (call
t('messageInfo.cost') or the appropriate key) so the label uses translated text;
update the object for the cost stat to use t('messageInfo.cost') (or the
component's existing translation function) instead of the hardcoded string.

In
`@services/platform/app/features/settings/governance/components/budget-editor.tsx`:
- Around line 469-495: The table in the BudgetEditor component is missing a
<caption> and column semantics on headers; update the table markup in
budget-editor.tsx (the table inside the overflow-x-auto div) to add a caption
(use an sr-only class if it should be visually hidden) and add scope="col"
attributes to each <th> (e.g., the Scope, Target, Period, Max Tokens, Max Cost,
Max Requests, Actions headers) so screen readers get correct column semantics.
- Around line 34-52: The UI strings in SCOPE_OPTIONS, ROLE_OPTIONS,
PERIOD_OPTIONS and all other user-facing text and aria-labels in
budget-editor.tsx are hardcoded English; replace them with calls to the
translation hook (useT / t) and translation keys (e.g.,
t('governance.budget.scope.default')) instead of raw strings, and ensure
aria-label attributes use translation keys as well (e.g.,
aria-label={t('governance.budget.scope.aria')}); scan the rest of the component
(lines referenced ~118-297, 455-572) and convert placeholders, helper copy,
button text, and ARIA values to use the t() function with meaningful keys and
update/add corresponding entries in the locale files.
- Around line 357-418: The editor mutates local enabled/rules state immediately
(in handleToggleEnabled, removeRule, handleDialogSave,
openAddDialog/openEditDialog flows) but doesn’t revert on upsertMutation
failure, leaving UI out of sync; change these handlers to either (A) perform the
upsertMutation.mutateAsync first and only call setEnabled/setRules after it
resolves successfully, or (B) perform an optimistic update but capture previous
state, call upsertMutation.mutateAsync, and on catch revert to the captured
previous state and show the toast; reference
saveConfig/upsertMutation.mutateAsync and the local setters setEnabled/setRules
and handlers handleToggleEnabled, removeRule, handleDialogSave (and
openAddDialog/openEditDialog flows) so you update or revert state consistently
when the mutation fails.

In
`@services/platform/app/features/settings/governance/components/usage-dashboard.tsx`:
- Around line 149-150: The component in usage-dashboard.tsx contains hardcoded
UI strings; replace them with i18n keys by importing and using the existing
translation hook (e.g. useTranslation or t) in the component that renders the
<Text as="h3" variant="label"> and the usage stat labels. Swap "Usage" for a
translation key like t('settings.usage.title'), replace "Total Tokens", "Total
Cost", "Total Requests", "Active Users" with keys such as
t('settings.usage.totalTokens'), t('settings.usage.totalCost'),
t('settings.usage.totalRequests'), t('settings.usage.activeUsers'), and replace
the empty state strings "No usage data" and "No usage has been recorded for this
period." with t('settings.usage.empty.title') and
t('settings.usage.empty.description'); ensure the translation hook is imported
at the top of the component and all uses reference the t(...) function instead
of raw strings so the component (e.g. UsageDashboard or the exported component
in this file) is fully internationalized.
- Around line 82-141: Replace hardcoded English column headers and inline labels
in the columns array with translated strings using the project's translation
hook (e.g., call useTranslation()/t) inside the UsageDashboard component: import
and call the hook at the top of the component, then change header values
('User', 'Team') and header render functions (e.g., () => <div
className="text-right">Input Tokens</div>) to use t('...') keys (e.g.,
t('governance.user'), t('governance.inputTokens'), t('governance.outputTokens'),
t('governance.cost'), t('governance.requests')) and replace the inline team
empty label '\u2014' if it should be localized; update any label variants (Text
variant props remain) so all user-facing strings use t(...) consistently.
- Around line 45-48: Replace the direct use of Date.prototype.toLocaleDateString
for the label with the shared formatDate utility: import formatDate from
'lib/utils/date/format', then change the assignment of label (the variable named
label that formats the date variable d) to call formatDate(d, ...) using the
equivalent options for long month and numeric year so the output matches the
previous behavior; ensure the import is added at the top of the file and remove
the toLocaleDateString call.

In `@services/platform/app/routes/dashboard/`$id/settings/governance.tsx:
- Around line 14-16: Change searchSchema so the tab field is constrained to
allowed values instead of any string: replace z.string().optional() with
z.enum(['system-prompt','roles','policies']).optional() (or whatever exact
supported tabs are) and update validateSearch to fallback to 'system-prompt'
when tab is missing or invalid; locate the schema named searchSchema and the
validateSearch logic in this file and ensure the parsed value uses the enum
result or defaults to 'system-prompt'.
- Around line 45-64: tabItems currently hardcodes English labels inside the
useMemo array (values 'System Prompt', 'Budgets', 'Usage'); replace those
hardcoded strings with translated labels using the app's translation hook (e.g.,
call const { t } = useTranslation() or useTranslations() at top of the
component) and use t('settings.governance.systemPrompt'),
t('settings.governance.budgets'), t('settings.governance.usage') (or the
project's preferred keys) for the label fields so SystemPromptEditor,
BudgetEditor and UsageDashboard render localized tab labels; keep the value and
content entries the same and ensure the translation hook is included in the
useMemo dependency array.

In
`@services/platform/app/routes/dashboard/`$id/settings/providers/$providerName.tsx:
- Around line 462-484: In handleSubmitModel, validate form.inputCostPerMillion
and form.outputCostPerMillion before converting them into the cost object: parse
each value to a Number, ensure it is finite and >= 0, and reject or surface a
validation error (prevent submission) if not; only then compute
inputCentsPerMillion and outputCentsPerMillion via Math.round and include cost
in model (otherwise set cost to undefined or omit the invalid field). Reference
the form fields (form.inputCostPerMillion, form.outputCostPerMillion), the cost
construction logic and the model variable to locate and update the serialization
and add the validation check.
- Around line 535-537: The new provider-cost column and related UI strings are
hardcoded English; replace all user-facing text in TableHead ("Cost / 1M
tokens"), the field labels/placeholders/helper texts and the rendered cost copy
referenced around the TableHead and the ranges noted (including the blocks
currently around lines 579-587 and 687-719) with i18n translations: import and
use the existing translation hook/function (e.g., useTranslations or t) in this
component, replace string literals with t('...') calls using meaningful keys
(e.g., "providers.costPerMillion", "providers.costPlaceholder",
"providers.costHelper", etc.), and add corresponding keys to the locale resource
files so the UI reads from translations rather than hardcoded English.

In `@services/platform/convex/agent_tools/location/mutations.ts`:
- Around line 159-171: The budget check (checkBudget) can throw after the
thread's stream is created and generationStatus is set to 'generating', leaving
the thread stuck; move the call to checkBudget (using ctx, organizationId,
userId, and agentConfig.agentTeamId) to run before creating the stream and
before updating thread.generationStatus to 'generating', or if you prefer to
keep ordering, catch the budget failure and explicitly revert the thread
record's generationStatus back to 'idle' and persist the budget-exceeded
assistant message before rethrowing; ensure the fix touches the flow that
creates the stream and sets generationStatus so no path can leave the thread in
'generating' without scheduling a generation.

In `@services/platform/convex/governance/budget_enforcement.ts`:
- Around line 92-112: The code currently calls
ctx.db.query('usageLedger').withIndex('by_org_team_period', ...) .collect() into
entries and then loops to sum into totals; change this to iterate the query
result with a for await loop instead of .collect(): obtain the async iterator
from ctx.db.query(...).withIndex(...) and use for await (const entry of ...) to
increment totals.totalTokens, totals.costEstimate, and totals.requestCount (the
UsageTotals object), preserving the same filter keys (organizationId, teamId,
periodKey) used in the withIndex call.

In `@services/platform/convex/governance/internal_mutations.ts`:
- Around line 68-84: The reconciliation loop uses the original keep object's
totals for every patch so merging multiple duplicates loses intermediate counts;
update the running totals before each patch (or compute cumulative sums first)
so each subsequent merge adds to the previously-merged value—e.g., use mutable
accumulators (like let accInput = keep.inputTokens, accOutput =
keep.outputTokens, accTotal = keep.totalTokens, accCost = keep.costEstimate,
accCount = keep.requestCount) and inside the for-loop add dup's values to these
accumulators and call ctx.db.patch(keep._id, { inputTokens: accInput,
outputTokens: accOutput, totalTokens: accTotal, costEstimate: accCost,
requestCount: accCount }), then ctx.db.delete(dup._id); alternatively refresh
keep after each patch by re-reading it from ctx.db before the next iteration.

In `@services/platform/convex/governance/mutations.ts`:
- Around line 35-42: The code validates args.config using
budgetConfigSchema.safeParse but then continues to use the raw args.config;
change the logic in the mutation so that after calling
budgetConfigSchema.safeParse(args.config) you capture parsed.data (e.g., const
normalizedConfig = parsed.data) and replace all later uses of args.config with
normalizedConfig — including the ctx.db.patch(...) and ctx.db.insert(...) calls
and the audit log entry — so the persisted config and audit record use the
validated/normalized data rather than the original input.

In `@services/platform/convex/openai_compat/internal_actions.ts`:
- Around line 384-405: The ledger entry uses the original modelId to estimate
cost, but resolveLanguageModelWithFallback() may return a different
resolved.modelData with provider-specific pricing; change the usage-cost
calculation to use the resolved model info (resolved.modelData or
resolved.modelId/pricing) when calling estimateCostCents so the
costEstimateCents reflects the actual provider/model chosen, and ensure the
mutation call to internal.governance.internal_mutations.incrementUsageLedger
uses those resolved values (reference resolveLanguageModelWithFallback(),
resolved.modelData, estimateCostCents, incrementUsageLedger, and result.usage to
locate the code).

In `@services/platform/lib/shared/schemas/providers.ts`:
- Around line 14-19: The cost schema currently allows negative values for
inputCentsPerMillion and outputCentsPerMillion; update the cost object schema
(the cost definition) so both inputCentsPerMillion and outputCentsPerMillion are
constrained to non-negative numbers (e.g., use z.number().nonnegative() or
z.number().min(0)) to reject negative pricing at the schema boundary before it
reaches cost estimation.

In `@services/rag/app/services/rag_service.py`:
- Around line 281-284: The service currently stores per-request embedding usage
on the singleton via self.last_search_usage (and similar at lines ~301-302);
change the search (and affected generate) methods to stop writing to
self.last_search_usage and instead return the usage as part of the method return
value (e.g., include a usage field in the search result tuple/dict) so callers
receive request-scoped usage directly; update callers of search (and the
generate-related method) to read the returned usage rather than reading
self.last_search_usage.

In `@services/rag/app/services/search_service.py`:
- Around line 61-68: The code currently races _fts_search and
embed_query_with_usage with asyncio.gather so if _fts_search raises the
embedding task's usage result can be lost and last_search_usage remains the
zero-token placeholder; change the flow to always await or retrieve the
embedding result (from _embedding.embed_query_with_usage) independently of
_fts_search failures and assign self.last_search_usage = query_result.usage
before handling BM25/vector-only fallback (the code path that calls
embed_query()); specifically ensure the embedding_task started in this block is
awaited (or its result retrieved) even if the fts_task raises, so
last_search_usage reflects the actual usage for later fallback logic that relies
on self.last_search_usage.

---

Outside diff comments:
In `@services/crawler/app/routers/web.py`:
- Around line 123-132: The failure branches that return a
WebFetchExtractResponse omit the parser usage, causing billing/ledger
underreporting; update the negative-return paths (the one inside the block that
checks if not result.get("success") and the analogous failure block later) to
include usage=result.get("usage", {}) so the response preserves
parse_file_with_vision() usage info; locate the WebFetchExtractResponse
constructors in this router and add the usage field populated from
result.get("usage", {}) to each failure return.

In
`@services/platform/app/routes/dashboard/`$id/settings/providers/$providerName.tsx:
- Around line 523-603: The models table rendered by the Table component (the
block starting with <Table> and the config.models.map loop) lacks a caption for
screen readers; add a <caption> as the first child of <Table> (just below the
opening <Table>) with a descriptive string (e.g., use the i18n helper
t('providers.modelsTableCaption') or a literal like "Provider models and
settings") and make it visually hidden with the sr-only class if you don't want
it visible; ensure the caption text clearly summarizes the table's purpose so
screen readers receive context for the columns.

In `@services/platform/convex/agent_tools/approval_shared.ts`:
- Around line 59-90: The budget check currently throws an error after creating
the stream and setting generationStatus, causing a rollback; move the call to
checkBudget (the check of checkBudget(...) result for userId) to before
persistentStreaming.createStream() and before patching
threadMeta.generationStatus, and when budgetResult.allowed is false do NOT
throw—use saveMessage(...) to save a user-visible budget denial/notice message
(similar to how promptMessageId is created) and return early so no stream is
created and no generationStatus is set; reference
persistentStreaming.createStream, saveMessage, checkBudget, threadMeta and
generationStatus when making the change.

In `@services/platform/convex/agent_tools/human_input/mutations.ts`:
- Around line 169-205: The budget check is happening after
createStream()/setting generationStatus so throwing on denial aborts the
transaction and loses the completed approval; move the budget enforcement (the
checkBudget call and its handling) to run before calling
persistentStreaming.createStream and before patching
threadMetadata.generationStatus/streamId, and if not allowed do not
throw—persist a budget-notice (e.g. save a record or patch the thread/response
with a budgetExceeded flag/message) and return cleanly with the assistant
budget-exceeded response instead of raising an error; update references to
checkBudget, persistentStreaming.createStream, and the threadMetadata patch to
reflect the new order.

In `@services/platform/convex/agent_tools/rag/rag_search_tool.ts`:
- Around line 151-156: The 'list_indexed' branch in the execute function returns
an AgentIndexedDocumentListResult directly rather than the standardized tool
response shape; update the execute handler so when args.operation ===
'list_indexed' it calls listIndexedDocuments(ctx, {...}) and then returns an
object matching the search branch shape (e.g. { success: true, response: <the
list result>, output: <optional human-readable summary>, usage: <optional usage
info> }) instead of the raw AgentIndexedDocumentListResult so callers receive
the consistent { success, response, output, usage? } structure.

In `@services/platform/convex/agent_tools/web/web_tool.ts`:
- Around line 77-84: The code only sets instruction for file URLs; update the
instruction calculation so user-provided extraction prompts (args.query) are
forwarded for webpage and inline-URL fetches as well: compute instruction =
args.query when present (not just when isFileUrl(targetUrl)) and pass that
instruction into the fetchAndExtract call (function fetchAndExtract in this
file), preserving any existing file-URL logic that may still apply.

In `@services/platform/convex/lib/agent_chat/start_agent_chat.ts`:
- Around line 132-154: The code creates a persistent stream via
persistentStreaming.createStream and patches threadMetadata.generationStatus to
"generating" before performing the budget check, which leaves an orphaned stream
on denial; fix by moving the budget validation to run before calling
persistentStreaming.createStream and before setting threadMetadata to
generating, or if you prefer to keep the current order, ensure the deny path
explicitly clears the stream-related fields (e.g., set streamId = undefined,
generationStartTime = undefined, and reset any stream/generation markers) by
calling ctx.db.patch on the same threadMetadata record whenever budget is
denied; apply the same change to the other occurrence that updates
threadMetadata and creates streams (the block around the second
createStream/patch).

In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 1807-1834: The error completion path in generate_response.ts calls
onAgentComplete with a result object that omits providerCost, causing divergence
from success/cancel paths; update the result passed to onAgentComplete (the
object constructed in the await onAgentComplete(...) call) to include
providerCost (and/or costEstimateCents if you use that symbol) populated the
same way as the success/cancel completion paths so failed generations that have
token usage use the resolved provider pricing; locate the onAgentComplete
invocation in generate_response.ts and add providerCost: providerCost (or the
exact variable name used where resolved pricing is computed) to the result
object along with any existing usage/duration fields.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 542660b9-0cc0-4eea-a4b0-96e5bcc2558d

📥 Commits

Reviewing files that changed from the base of the PR and between 76038fd and 982ebcf.

📒 Files selected for processing (43)

examples/providers/openrouter.json
packages/tale_knowledge/src/tale_knowledge/embedding/__init__.py
packages/tale_knowledge/src/tale_knowledge/embedding/service.py
services/crawler/app/models.py
services/crawler/app/routers/web.py
services/crawler/app/services/web_image_extractor.py
services/platform/app/features/chat/components/message-info-dialog.tsx
services/platform/app/features/chat/hooks/queries.ts
services/platform/app/features/settings/governance/components/budget-editor.tsx
services/platform/app/features/settings/governance/components/usage-dashboard.tsx
services/platform/app/routes/dashboard/$id/settings/governance.tsx
services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx
services/platform/convex/agent_tools/approval_shared.ts
services/platform/convex/agent_tools/human_input/mutations.ts
services/platform/convex/agent_tools/location/mutations.ts
services/platform/convex/agent_tools/rag/format_search_results.ts
services/platform/convex/agent_tools/rag/rag_search_tool.ts
services/platform/convex/agent_tools/web/helpers/fetch_and_extract.ts
services/platform/convex/agent_tools/web/helpers/types.ts
services/platform/convex/agent_tools/web/web_tool.ts
services/platform/convex/governance/budget_enforcement.ts
services/platform/convex/governance/cost_estimation.ts
services/platform/convex/governance/helpers.ts
services/platform/convex/governance/internal_mutations.ts
services/platform/convex/governance/mutations.ts
services/platform/convex/governance/queries.ts
services/platform/convex/governance/schema.ts
services/platform/convex/lib/agent_chat/internal_actions.ts
services/platform/convex/lib/agent_chat/start_agent_chat.ts
services/platform/convex/lib/agent_completion/on_agent_complete.ts
services/platform/convex/lib/agent_response/generate_response.ts
services/platform/convex/lib/agent_response/types.ts
services/platform/convex/message_metadata/internal_mutations.ts
services/platform/convex/openai_compat/internal_actions.ts
services/platform/convex/providers/file_actions.ts
services/platform/convex/providers/resolve_model.ts
services/platform/convex/streaming/schema.ts
services/platform/convex/streaming/validators.ts
services/platform/lib/shared/schemas/providers.ts
services/rag/app/models.py
services/rag/app/routers/search.py
services/rag/app/services/rag_service.py
services/rag/app/services/search_service.py

💤 Files with no reviewable changes (1)

services/platform/convex/governance/schema.ts

coderabbitai · 2026-04-10T11:15:31Z

+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 38, "outputCentsPerMillion": 172 }
    },
    {
      "id": "deepseek/deepseek-v3.2",
      "displayName": "DeepSeek V3.2",
      "description": "Strong reasoning and general capabilities",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 26, "outputCentsPerMillion": 38 }
    },
    {
      "id": "qwen/qwen3-next-80b-a3b-instruct",
      "displayName": "Qwen3 Next 80B",
      "description": "Fast and efficient instruction-following model",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 9, "outputCentsPerMillion": 110 }
    },
    {
      "id": "qwen/qwen3.5-35b-a3b",
      "displayName": "Qwen3.5 35B",
      "description": "Compact and fast model",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 16, "outputCentsPerMillion": 130 }
    },
    {
      "id": "anthropic/claude-opus-4.6",
      "displayName": "Claude Opus 4.6",
      "description": "Most capable model for complex reasoning and coding",
-      "tags": [
-        "chat",
-        "vision"
-      ]
+      "tags": ["chat", "vision"],
+      "cost": { "inputCentsPerMillion": 500, "outputCentsPerMillion": 2500 }
    },
    {
      "id": "openai/gpt-5.2",
      "displayName": "GPT-5.2",
      "description": "OpenAI's latest flagship model",
-      "tags": [
-        "chat",
-        "vision"
-      ]
+      "tags": ["chat", "vision"],
+      "cost": { "inputCentsPerMillion": 175, "outputCentsPerMillion": 1400 }
    },
    {
      "id": "qwen/qwen3-vl-32b-instruct",
      "displayName": "Qwen3 VL 32B",
      "description": "Vision-language model for image understanding",
-      "tags": [
-        "chat",
-        "vision"
-      ]
+      "tags": ["chat", "vision"],
+      "cost": { "inputCentsPerMillion": 10, "outputCentsPerMillion": 42 }
    },
    {
      "id": "qwen/qwen3-embedding-8b",
      "displayName": "Qwen3 Embedding 8B",
      "description": "Text embedding model for semantic search",
-      "tags": [
-        "embedding"
-      ],
-      "dimensions": 1536
+      "tags": ["embedding"],
+      "dimensions": 1536,
+      "cost": { "inputCentsPerMillion": 1, "outputCentsPerMillion": 0 }
    },
    {
      "id": "openai/gpt-5.2-chat",
      "displayName": "GPT-5.2 Instant",
      "description": "Fast, low-latency variant of GPT-5.2 optimized for chat",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 175, "outputCentsPerMillion": 1400 }
    },
    {
      "id": "openai/gpt-5.2-pro",
      "displayName": "GPT-5.2 Pro",
      "description": "Most advanced GPT-5.2 variant for complex reasoning tasks",
-      "tags": [
-        "chat",
-        "vision"
-      ]
+      "tags": ["chat", "vision"],
+      "cost": { "inputCentsPerMillion": 2100, "outputCentsPerMillion": 16800 }
    },
    {
      "id": "anthropic/claude-sonnet-4.6",
      "displayName": "Claude Sonnet 4.6",
      "description": "Balanced performance and speed for everyday tasks",
-      "tags": [
-        "chat",
-        "vision"
-      ]
+      "tags": ["chat", "vision"],
+      "cost": { "inputCentsPerMillion": 300, "outputCentsPerMillion": 1500 }
    },
    {
      "id": "anthropic/claude-haiku-4.5",
      "displayName": "Claude Haiku 4.5",
      "description": "Fast and compact model for lightweight tasks",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 100, "outputCentsPerMillion": 500 }
    },
    {
      "id": "google/gemini-3.1-pro-preview",
      "displayName": "Gemini 3 Pro",
      "description": "Google's most capable Gemini model",
-      "tags": [
-        "chat",
-        "vision"
-      ]
+      "tags": ["chat", "vision"],
+      "cost": { "inputCentsPerMillion": 200, "outputCentsPerMillion": 1200 }
    },
    {
      "id": "google/gemini-3-flash-preview",
      "displayName": "Gemini 3 Flash",
      "description": "Fast and efficient Gemini model",
-      "tags": [
-        "chat",
-        "vision"
-      ]
+      "tags": ["chat", "vision"],
+      "cost": { "inputCentsPerMillion": 50, "outputCentsPerMillion": 300 }
    },
    {
      "id": "mistralai/mistral-large-2512",
      "displayName": "Mistral Large 3",
      "description": "Mistral AI's flagship large language model",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 50, "outputCentsPerMillion": 150 }
    },
    {
      "id": "mistralai/mistral-medium-3",
      "displayName": "Mistral Medium 3",
      "description": "Balanced Mistral model for general tasks",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 40, "outputCentsPerMillion": 200 }
    },
    {
      "id": "meta-llama/llama-4-maverick",
      "displayName": "LLaMA 4 Maverick",
      "description": "Meta's powerful open-source large language model",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 15, "outputCentsPerMillion": 60 }
    },
    {
      "id": "meta-llama/llama-4-scout",
      "displayName": "LLaMA 4 Scout",
      "description": "Meta's efficient open-source language model",
-      "tags": [
-        "chat"
-      ]
+      "tags": ["chat"],
+      "cost": { "inputCentsPerMillion": 8, "outputCentsPerMillion": 30 }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

What are the current OpenRouter input and output prices per 1M tokens for the models listed in examples/providers/openrouter.json?

💡 Result:

Web search failed: Server error: no LLM provider could handle the message

🏁 Script executed:

# First, let's look at the full file to understand the structure fd openrouter.json

Repository: tale-project/tale

Length of output: 95

🏁 Script executed:

# Read the file to see all models and their pricing cat examples/providers/openrouter.json

Repository: tale-project/tale

Length of output: 5454

🌐 Web query:

OpenRouter API pricing 2025 current rates

💡 Result:

As of April 10, 2026, OpenRouter’s “current rates” break down into (a) model token rates + (b) OpenRouter fees:

Model (token) pricing: You pay the posted per‑model rates per 1M input tokens and per 1M output tokens shown in the OpenRouter model catalog; OpenRouter says it does not mark up provider pricing. [1]

Example (one model): GPT‑4.1 shows $2 / 1M input tokens and $8 / 1M output tokens (effective pricing page). [2]

Pay‑as‑you‑go platform fee (when buying credits): 5.5% of the order amount for non‑crypto payments ($0.80 minimum fee); 5.0% flat for crypto (no minimum). [3]

BYOK (Bring Your Own Key): 1M free BYOK requests per month, then 5% fee after that. [1][4]

Sources: OpenRouter Pricing page + FAQ + fee announcement + model pricing page. [1][4][3][2]

🌐 Web query:

OpenRouter documentation model pricing rates per million tokens

💡 Result:

OpenRouter exposes per-token USD pricing for each model in the Models API response under pricing.prompt (input tokens) and pricing.completion (output tokens). To get rates per 1M tokens, multiply those values by 1,000,000. [1]

Endpoint

GET https://openrouter.ai/api/v1/models (returns model list + pricing) [1]

Examples (from the API response)

anthropic/claude-opus-4.6-fast: prompt: 0.00003, completion: 0.00015 ⇒ $30 / 1M input, $150 / 1M output [1]

z-ai/glm-5.1: prompt: 0.00000126, completion: 0.00000396 ⇒ $1.26 / 1M input, $3.96 / 1M output [1]

google/gemma-4-26b-a4b-it: prompt: 0.00000013, completion: 0.0000004 ⇒ $0.13 / 1M input, $0.40 / 1M output [1]

OpenRouter’s docs also note that pricing is shown per million tokens (often separate prompt vs completion) on the Models page, and that prices are passed through from underlying providers (no inference markup). [2]

Sources: [1] [2]

Several models in this example file do not appear to exist on OpenRouter, making the pricing values unverifiable and unsuitable for production budget enforcement.

The file lists future-dated or fictional model versions (GPT-5.2, Claude 4.6, LLaMA 4, Gemini 3, DeepSeek V3.2, Qwen3 variants) that are not available in the current OpenRouter catalog. Using prices tied to non-existent models will produce incorrect cost estimates and budget calculations. Either use verified models that exist on OpenRouter with their actual current pricing, or clearly document this as a template/placeholder example.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/providers/openrouter.json` around lines 16 - 137, The example contains non-existent/unverified OpenRouter model entries (e.g. ids "openai/gpt-5.2", "openai/gpt-5.2-pro", "anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "google/gemini-3.1-pro-preview", "deepseek/deepseek-v3.2", "qwen/qwen3-next-80b-a3b-instruct", "meta-llama/llama-4-maverick", etc.) with hardcoded prices; either replace each of these model objects with actual verified OpenRouter model ids and current pricing from the OpenRouter catalog, or mark the entire file as a template by adding a clear machine-readable flag (e.g. add a top-level "template": true or add "verified": false to each model object) and update displayName/description to indicate placeholder pricing so production budget enforcement won’t use these values.

coderabbitai · 2026-04-10T11:15:31Z

+      ...(metadata?.costEstimateCents != null
+        ? [
+            {
+              label: 'Cost',
+              value: (
+                <Text className="font-mono">
+                  {(() => {
+                    const dollars = metadata.costEstimateCents / 100;
+                    if (dollars === 0) return '$0.00';
+                    if (dollars >= 1) return `$${dollars.toFixed(2)}`;
+                    return `$${dollars.toPrecision(3)}`;
+                  })()}
+                </Text>
+              ),
+            },
+          ]
+        : []),


⚠️ Potential issue | 🟡 Minor

Localize the new cost label.

'Cost' is new user-facing text but it bypasses useT, so this stat will stay English in localized UIs. Please move it to a translation key, e.g. t('messageInfo.cost').

As per coding guidelines **/*.{tsx,jsx}: Do NOT hardcode text, use the translation hooks/functions instead for user-facing UI.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/chat/components/message-info-dialog.tsx` around lines 232 - 248, The "Cost" label is hardcoded in the metadata stat block (the object created when metadata?.costEstimateCents != null) and must be localized; replace the literal 'Cost' with the translation call from the existing hook (e.g., useT or t) used elsewhere in this component (call t('messageInfo.cost') or the appropriate key) so the label uses translated text; update the object for the cost stat to use t('messageInfo.cost') (or the component's existing translation function) instead of the hardcoded string.

coderabbitai · 2026-04-10T11:15:31Z

+const SCOPE_OPTIONS = [
+  { value: 'default', label: 'Default' },
+  { value: 'user', label: 'User' },
+  { value: 'team', label: 'Team' },
+  { value: 'role', label: 'Role' },
+];
+
+const ROLE_OPTIONS = [
+  { value: 'admin', label: 'Admin' },
+  { value: 'developer', label: 'Developer' },
+  { value: 'editor', label: 'Editor' },
+  { value: 'member', label: 'Member' },
+];
+
+const PERIOD_OPTIONS = [
+  { value: 'monthly', label: 'Monthly' },
+  { value: 'weekly', label: 'Weekly', disabled: true },
+  { value: 'daily', label: 'Daily', disabled: true },
+];


⚠️ Potential issue | 🟠 Major

This new governance UI is hardcoded English.

The added labels, placeholders, helper copy, button text, and ARIA labels are not routed through useT, so the feature won’t localize and the ARIA text also violates the repo rule.

As per coding guidelines "Do NOT hardcode text, use the translation hooks/functions instead for user-facing UI" and "ALWAYS use translation keys for aria-label values. Never hardcode English in ARIA attributes".

Also applies to: 118-297, 455-572

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/settings/governance/components/budget-editor.tsx` around lines 34 - 52, The UI strings in SCOPE_OPTIONS, ROLE_OPTIONS, PERIOD_OPTIONS and all other user-facing text and aria-labels in budget-editor.tsx are hardcoded English; replace them with calls to the translation hook (useT / t) and translation keys (e.g., t('governance.budget.scope.default')) instead of raw strings, and ensure aria-label attributes use translation keys as well (e.g., aria-label={t('governance.budget.scope.aria')}); scan the rest of the component (lines referenced ~118-297, 455-572) and convert placeholders, helper copy, button text, and ARIA values to use the t() function with meaningful keys and update/add corresponding entries in the locale files.

coderabbitai · 2026-04-10T11:15:31Z

+  const saveConfig = useCallback(
+    async (configToSave: { enabled: boolean; rules: BudgetRule[] }) => {
+      try {
+        await upsertMutation.mutateAsync({
+          organizationId,
+          policyType: 'budgets',
+          config: configToSave,
+        });
+        toast({ title: 'Budget configuration saved' });
+      } catch (error: unknown) {
+        const message =
+          error instanceof Error ? error.message : 'Failed to save';
+        toast({ title: message, variant: 'destructive' });
+      }
+    },
+    [organizationId, upsertMutation, toast],
+  );
+
+  const handleToggleEnabled = useCallback(
+    (checked: boolean) => {
+      setEnabled(checked);
+      void saveConfig({ enabled: checked, rules });
+    },
+    [saveConfig, rules],
+  );
+
+  const removeRule = useCallback(
+    (index: number) => {
+      const newRules = rules.filter((_, i) => i !== index);
+      setRules(newRules);
+      void saveConfig({ enabled, rules: newRules });
+    },
+    [rules, enabled, saveConfig],
+  );
+
+  const openAddDialog = useCallback(() => {
+    setEditingIndex(null);
+    setDialogRule(emptyRule());
+    setDialogOpen(true);
+  }, []);
+
+  const openEditDialog = useCallback(
+    (index: number) => {
+      setEditingIndex(index);
+      setDialogRule(rules[index]);
+      setDialogOpen(true);
+    },
+    [rules],
+  );
+
+  const handleDialogSave = useCallback(
+    (rule: BudgetRule) => {
+      let newRules: BudgetRule[];
+      if (editingIndex === null) {
+        newRules = [...rules, rule];
+      } else {
+        newRules = rules.map((r, i) => (i === editingIndex ? rule : r));
+      }
+      setRules(newRules);
+      void saveConfig({ enabled, rules: newRules });
+    },
+    [editingIndex, rules, enabled, saveConfig],


⚠️ Potential issue | 🟠 Major

Failed autosaves leave the editor out of sync with the server.

Toggle/add/edit/delete all update local enabled/rules state before the mutation resolves, and saveConfig only shows a toast on failure. If the mutation is rejected, the screen keeps showing an unsaved budget policy until some later refetch overwrites it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/settings/governance/components/budget-editor.tsx` around lines 357 - 418, The editor mutates local enabled/rules state immediately (in handleToggleEnabled, removeRule, handleDialogSave, openAddDialog/openEditDialog flows) but doesn’t revert on upsertMutation failure, leaving UI out of sync; change these handlers to either (A) perform the upsertMutation.mutateAsync first and only call setEnabled/setRules after it resolves successfully, or (B) perform an optimistic update but capture previous state, call upsertMutation.mutateAsync, and on catch revert to the captured previous state and show the toast; reference saveConfig/upsertMutation.mutateAsync and the local setters setEnabled/setRules and handlers handleToggleEnabled, removeRule, handleDialogSave (and openAddDialog/openEditDialog flows) so you update or revert state consistently when the mutation fails.

coderabbitai · 2026-04-10T11:15:31Z

+            <div className="overflow-x-auto">
+              <table className="w-full text-sm">
+                <thead>
+                  <tr className="border-border border-b">
+                    <th className="text-muted-foreground px-3 py-2 text-left font-medium">
+                      Scope
+                    </th>
+                    <th className="text-muted-foreground px-3 py-2 text-left font-medium">
+                      Target
+                    </th>
+                    <th className="text-muted-foreground px-3 py-2 text-left font-medium">
+                      Period
+                    </th>
+                    <th className="text-muted-foreground px-3 py-2 text-right font-medium">
+                      Max Tokens
+                    </th>
+                    <th className="text-muted-foreground px-3 py-2 text-right font-medium">
+                      Max Cost
+                    </th>
+                    <th className="text-muted-foreground px-3 py-2 text-right font-medium">
+                      Max Requests
+                    </th>
+                    <th className="text-muted-foreground px-3 py-2 text-right font-medium">
+                      Actions
+                    </th>
+                  </tr>
+                </thead>


⚠️ Potential issue | 🟠 Major

The rules table is missing required accessibility semantics.

The raw table markup still needs a caption, and the headers should explicitly preserve column semantics on the generated <th> elements.

As per coding guidelines "ENSURE data tables have a <caption> (can be visually hidden with sr-only)" and "USE scope=\"col\" on <th> elements in tables".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/app/features/settings/governance/components/budget-editor.tsx` around lines 469 - 495, The table in the BudgetEditor component is missing a <caption> and column semantics on headers; update the table markup in budget-editor.tsx (the table inside the overflow-x-auto div) to add a caption (use an sr-only class if it should be visually hidden) and add scope="col" attributes to each <th> (e.g., the Scope, Target, Period, Max Tokens, Max Cost, Max Requests, Actions headers) so screen readers get correct column semantics.

coderabbitai · 2026-04-10T11:15:32Z

+    // Track usage for client tool mode (this path bypasses onAgentComplete)
+    const usage = await result.usage;
+    if (usage && args.organizationId) {
+      const { estimateCostCents } =
+        await import('../governance/cost_estimation');
+      const inputTokens = usage.inputTokens ?? 0;
+      const outputTokens = usage.outputTokens ?? 0;
+      if (inputTokens > 0 || outputTokens > 0) {
+        await ctx
+          .runMutation(
+            internal.governance.internal_mutations.incrementUsageLedger,
+            {
+              organizationId: args.organizationId,
+              userId: args.userId ?? 'system',
+              inputTokens,
+              outputTokens,
+              costEstimateCents: estimateCostCents(
+                modelId,
+                inputTokens,
+                outputTokens,
+              ),
+              timestamp: Date.now(),


⚠️ Potential issue | 🟠 Major

Use the resolved model and provider pricing for this ledger entry.

This path estimates cost from modelId, but resolveLanguageModelWithFallback() can pick a different model/provider and already exposes provider-specific pricing on resolved.modelData. That makes client-tool chats drift from the main cost ledger whenever default resolution, failover, or provider overrides are involved.

💡 Suggested fix

// Track usage for client tool mode (this path bypasses onAgentComplete) const usage = await result.usage; if (usage && args.organizationId) { const { estimateCostCents } = await import('../governance/cost_estimation'); + const providerCost = + resolved.modelData.inputCentsPerMillion != null + ? { + inputCentsPerMillion: + resolved.modelData.inputCentsPerMillion, + outputCentsPerMillion: + resolved.modelData.outputCentsPerMillion ?? 0, + } + : undefined; const inputTokens = usage.inputTokens ?? 0; const outputTokens = usage.outputTokens ?? 0; if (inputTokens > 0 || outputTokens > 0) { await ctx .runMutation( @@ organizationId: args.organizationId, userId: args.userId ?? 'system', inputTokens, outputTokens, costEstimateCents: estimateCostCents( - modelId, + resolved.modelData.modelId, inputTokens, outputTokens, + providerCost, ), timestamp: Date.now(), }, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Track usage for client tool mode (this path bypasses onAgentComplete)

const usage = await result.usage;

if (usage && args.organizationId) {

const { estimateCostCents } =

await import('../governance/cost_estimation');

const inputTokens = usage.inputTokens ?? 0;

const outputTokens = usage.outputTokens ?? 0;

if (inputTokens > 0 || outputTokens > 0) {

await ctx

.runMutation(

internal.governance.internal_mutations.incrementUsageLedger,

{

organizationId: args.organizationId,

userId: args.userId ?? 'system',

inputTokens,

outputTokens,

costEstimateCents: estimateCostCents(

modelId,

inputTokens,

outputTokens,

),

timestamp: Date.now(),

// Track usage for client tool mode (this path bypasses onAgentComplete)

const usage = await result.usage;

if (usage && args.organizationId) {

const { estimateCostCents } =

await import('../governance/cost_estimation');

const providerCost =

resolved.modelData.inputCentsPerMillion != null

? {

inputCentsPerMillion:

resolved.modelData.inputCentsPerMillion,

outputCentsPerMillion:

resolved.modelData.outputCentsPerMillion ?? 0,

}

: undefined;

const inputTokens = usage.inputTokens ?? 0;

const outputTokens = usage.outputTokens ?? 0;

if (inputTokens > 0 || outputTokens > 0) {

await ctx

.runMutation(

internal.governance.internal_mutations.incrementUsageLedger,

{

organizationId: args.organizationId,

userId: args.userId ?? 'system',

inputTokens,

outputTokens,

costEstimateCents: estimateCostCents(

resolved.modelData.modelId,

inputTokens,

outputTokens,

providerCost,

),

timestamp: Date.now(),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/openai_compat/internal_actions.ts` around lines 384 - 405, The ledger entry uses the original modelId to estimate cost, but resolveLanguageModelWithFallback() may return a different resolved.modelData with provider-specific pricing; change the usage-cost calculation to use the resolved model info (resolved.modelData or resolved.modelId/pricing) when calling estimateCostCents so the costEstimateCents reflects the actual provider/model chosen, and ensure the mutation call to internal.governance.internal_mutations.incrementUsageLedger uses those resolved values (reference resolveLanguageModelWithFallback(), resolved.modelData, estimateCostCents, incrementUsageLedger, and result.usage to locate the code).

coderabbitai · 2026-04-10T11:15:32Z

+  cost: z
+    .object({
+      inputCentsPerMillion: z.number(),
+      outputCentsPerMillion: z.number(),
+    })
+    .optional(),


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and read the schema file find . -type f -name "providers.ts" | grep -E "(schemas|shared)" | head -20

Repository: tale-project/tale

Length of output: 112

🏁 Script executed:

# Also search more broadly git ls-files | grep -E "providers\.ts" | head -20

Repository: tale-project/tale

Length of output: 230

🏁 Script executed:

# Check for the shared schemas directory structure git ls-files | grep -E "lib/shared/schemas" | head -20

Repository: tale-project/tale

Length of output: 941

🏁 Script executed:

# Read the providers.ts schema file cat -n services/platform/lib/shared/schemas/providers.ts | head -50

Repository: tale-project/tale

Length of output: 2118

🏁 Script executed:

# Check if there's package.json to verify Zod version find services/platform -name "package.json" -type f | xargs grep -l "zod" | head -5

Repository: tale-project/tale

Length of output: 91

🏁 Script executed:

# Check Zod version in package.json cat services/platform/package.json | jq '.dependencies.zod, .devDependencies.zod'

Repository: tale-project/tale

Length of output: 73

🏁 Script executed:

# Search for downstream usage of the cost fields from this schema rg "inputCentsPerMillion|outputCentsPerMillion" services/platform/ -A 3 -B 3

Repository: tale-project/tale

Length of output: 15290

🏁 Script executed:

# Also check for any existing validation on pricing fields rg "cost|pricing" services/platform/lib/shared/schemas/providers.ts -A 5 -B 5

Repository: tale-project/tale

Length of output: 432

🏁 Script executed:

# Verify Zod v4 supports .nonnegative() method and check its documentation python3 << 'EOF' # Zod v4 is a recent version with TypeScript support # Let's verify the API is correct import json zod_version = "4.3.6" # .nonnegative() is a standard Zod method that validates numbers >= 0 # It's equivalent to .min(0) print(f"Zod version: {zod_version}") print("Zod v4 supports .nonnegative() for numeric validation") print("This is a standard method in Zod >= 3.0") EOF

Repository: tale-project/tale

Length of output: 173

🏁 Script executed:

# Check cost_estimation.ts to see the actual calculation and impact cat -n services/platform/convex/governance/cost_estimation.ts | sed -n '1,50p'

Repository: tale-project/tale

Length of output: 1759

🏁 Script executed:

# Look at the actual calculation that uses these pricing values cat -n services/platform/convex/governance/cost_estimation.ts | sed -n '85,110p'

Repository: tale-project/tale

Length of output: 43

🏁 Script executed:

# Get the calculation function that uses the pricing values cat -n services/platform/convex/governance/cost_estimation.ts | grep -A 20 "calculateTokenCost"

Repository: tale-project/tale

Length of output: 43

🏁 Script executed:

# Also check how negative values would flow through the system cat -n services/platform/convex/governance/cost_estimation.ts | tail -50

Repository: tale-project/tale

Length of output: 1814

🏁 Script executed:

# Check if there are any tests that validate the schema behavior find services/platform -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "providers\|modelDefinitionSchema" | head -5

Repository: tale-project/tale

Length of output: 43

🏁 Script executed:

# Look at the full cost_estimation.ts to understand all usages wc -l services/platform/convex/governance/cost_estimation.ts

Repository: tale-project/tale

Length of output: 118

🏁 Script executed:

# Verify the UI layer also has no validation preventing negative input cat -n services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx | grep -B 5 -A 5 "inputCostPerMillion\|inputCentsPerMillion" | head -40

Repository: tale-project/tale

Length of output: 156

🏁 Script executed:

# Find the correct UI file path find services/platform -type f -name "*providerName*" | head -10

Repository: tale-project/tale

Length of output: 140

🏁 Script executed:

# Search for the form validation on pricing fields in the UI rg "inputCostPerMillion|inputCentsPerMillion" services/platform/app/ -B 10 -A 5 | head -80

Repository: tale-project/tale

Length of output: 8752

🏁 Script executed:

# Check if there's any input validation on the form fields cat -n services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx | sed -n '1,100p' | grep -A 20 "input\|Input\|type="

Repository: tale-project/tale

Length of output: 156

🏁 Script executed:

# Look at the form input for cost fields specifically rg "inputCostPerMillion|outputCostPerMillion" services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx -B 3 -A 10 | tail -60

Repository: tale-project/tale

Length of output: 164

🏁 Script executed:

# Read the UI file directly using proper path cat -n 'services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx' | sed -n '1,50p'

Repository: tale-project/tale

Length of output: 2371

🏁 Script executed:

# Search for input elements related to cost in the file rg "inputCostPerMillion" --max-count=20 'services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx' -B 5 -A 10

Repository: tale-project/tale

Length of output: 2668

🏁 Script executed:

# Verify if there are any other handlers or API endpoints that accept this schema rg "modelDefinitionSchema|providerJsonSchema" services/platform/ -B 2 -A 2 | head -50

Repository: tale-project/tale

Length of output: 3649

🏁 Script executed:

# Final confirmation: check if any tests exist that expect the current behavior find services/platform -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "inputCentsPerMillion\|cost.*negative" 2>/dev/null | head -5

Repository: tale-project/tale

Length of output: 43

Reject negative model pricing at the schema boundary.

The inputCentsPerMillion and outputCentsPerMillion fields accept negative numbers, which propagate through cost estimation calculations and would effectively credit usage against budgets instead of charging. These fields should be constrained to non-negative values.

Proposed fix

cost: z .object({ - inputCentsPerMillion: z.number(), - outputCentsPerMillion: z.number(), + inputCentsPerMillion: z.number().nonnegative(), + outputCentsPerMillion: z.number().nonnegative(), }) .optional(),

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cost: z

.object({

inputCentsPerMillion: z.number(),

outputCentsPerMillion: z.number(),

})

.optional(),

cost: z

.object({

inputCentsPerMillion: z.number().nonnegative(),

outputCentsPerMillion: z.number().nonnegative(),

})

.optional(),

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/platform/lib/shared/schemas/providers.ts` around lines 14 - 19, The cost schema currently allows negative values for inputCentsPerMillion and outputCentsPerMillion; update the cost object schema (the cost definition) so both inputCentsPerMillion and outputCentsPerMillion are constrained to non-negative numbers (e.g., use z.number().nonnegative() or z.number().min(0)) to reject negative pricing at the schema boundary before it reaches cost estimation.

coderabbitai · 2026-04-10T11:15:32Z

+class UsageInfo(BaseModel):
+    """Token usage metadata for AI API calls."""
+
+    input_tokens: int = Field(default=0, description="Input/prompt tokens consumed")
+    output_tokens: int = Field(default=0, description="Output/completion tokens consumed")
+    total_tokens: int = Field(default=0, description="Total tokens consumed")
+    model: str | None = Field(default=None, description="Model used for this operation")
+


⚠️ Potential issue | 🟠 Major

Split generation usage by operation/model instead of flattening it.

UsageInfo can only describe one model, but GenerateResponse now carries combined embedding + LLM tokens. services/rag/app/services/rag_service.py sums embedding tokens into input_tokens and then stamps the whole payload with the chat model, so any downstream pricing keyed by model will bill the embedding portion at the wrong rate. Return separate usage blocks per operation/model here instead of a single flattened record.

Also applies to: 261-261

coderabbitai · 2026-04-10T11:15:32Z

+        """Search the knowledge base using hybrid BM25 + vector search.
+
+        Embedding token usage available via `self.last_search_usage` after call.
+        """


⚠️ Potential issue | 🔴 Critical

Don’t cache request usage on the shared rag_service singleton.

last_search_usage is request-scoped, but rag_service is module-level shared state. Another request can overwrite this value between the await here and the caller reading or combining it, which can return the wrong usage in /search and misbill /generate. Thread the usage back in the return value instead of storing it on self.

Also applies to: 301-302

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/rag/app/services/rag_service.py` around lines 281 - 284, The service currently stores per-request embedding usage on the singleton via self.last_search_usage (and similar at lines ~301-302); change the search (and affected generate) methods to stop writing to self.last_search_usage and instead return the usage as part of the method return value (e.g., include a usage field in the search result tuple/dict) so callers receive request-scoped usage directly; update callers of search (and the generate-related method) to read the returned usage rather than reading self.last_search_usage.

coderabbitai · 2026-04-10T11:15:32Z

+        self.last_search_usage = EmbeddingUsage(model=self._embedding._model)
        try:
-            embedding_task = asyncio.create_task(self._embedding.embed_query(query))
+            embedding_task = asyncio.create_task(self._embedding.embed_query_with_usage(query))
            fts_task = asyncio.create_task(self._fts_search(query, file_ids, top_k * 3))

-            query_embedding, fts_results = await asyncio.gather(embedding_task, fts_task)
+            query_result, fts_results = await asyncio.gather(embedding_task, fts_task)
+            query_embedding = query_result.embedding
+            self.last_search_usage = query_result.usage


⚠️ Potential issue | 🟠 Major

Use the usage-aware embedding path in the BM25 fallback.

If asyncio.gather(...) fails because _fts_search() raises, the handler later recomputes the embedding via embed_query() at Line 153. last_search_usage then stays at the zero-token placeholder initialized here, so vector-only fallback searches undercount spend even though an embedding request was made.

Suggested fix

- self.last_search_usage = EmbeddingUsage(model=self._embedding._model) + self.last_search_usage = None try: embedding_task = asyncio.create_task(self._embedding.embed_query_with_usage(query)) fts_task = asyncio.create_task(self._fts_search(query, file_ids, top_k * 3)) @@ - if query_embedding is None: - query_embedding = await self._embedding.embed_query(query) + if query_embedding is None: + query_result = await self._embedding.embed_query_with_usage(query) + query_embedding = query_result.embedding + self.last_search_usage = query_result.usage

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@services/rag/app/services/search_service.py` around lines 61 - 68, The code currently races _fts_search and embed_query_with_usage with asyncio.gather so if _fts_search raises the embedding task's usage result can be lost and last_search_usage remains the zero-token placeholder; change the flow to always await or retrieve the embedding result (from _embedding.embed_query_with_usage) independently of _fts_search failures and assign self.last_search_usage = query_result.usage before handling BM25/vector-only fallback (the code path that calls embed_query()); specifically ensure the embedding_task started in this block is awaited (or its result retrieved) even if the fts_task raises, so last_search_usage reflects the actual usage for later fallback logic that relies on self.last_search_usage.

The embedding service now reads response.usage for token tracking. Add the missing usage field to the test mock SimpleNamespace.

larryro added 18 commits April 10, 2026 17:40

fix(platform): replace raw ID inputs with searchable user/team picker…

1bdfdd6

…s in budget editor Use SearchableSelect with useMembers/useOrgTeams hooks so admins can search by name/email instead of manually entering IDs.

refactor(platform): budget editor from inline cards to table + dialog

5a6f584

Rules displayed in a clean table with scope, target, period, limits. Add/edit via FormDialog with searchable user/team pickers and helpful placeholder hints. More consistent with other settings pages.

fix(platform): auto-save on all budget changes, remove Save button, f…

033216c

…ull-width table - Toggle, add/edit rule, and delete all auto-save immediately - Remove page-level Save button (no longer needed) - Remove max-w-4xl constraint so table fills available width

fix(platform): replace totals footer with summary stat cards above us…

d40dc62

…age table Show Total Tokens, Total Cost, Total Requests, Active Users as compact stat cards between the heading and the per-user table.

fix(platform): show full precision for small cost values in usage das…

e30d3e4

…hboard Use toPrecision(3) for costs < $1 to preserve significant digits. e.g. $0.0893 instead of $0.09, $0.000682 instead of $0.00. Costs >= $1 still show 2 decimal places.

fix(platform): add costEstimateCents to messageMetadata return validator

f70a87c

The query validator was missing the new field, causing ReturnsValidationError when reading metadata for messages that have cost data.

feat(platform): show estimated cost in message info dialog

9cf48b7

Display cost with full precision (toPrecision(3) for small values) in the message information popup, between Performance and Tool Calls.

fix(platform): move cost display into Token usage stat grid instead o…

e6ee25e

…f separate field

fix(platform): map costEstimateCents field in useMessageMetadata hook

505117f

The field was returned by the query but not mapped in the hook's explicit field selection, so it was always undefined in the UI.

greptile-apps Bot reviewed Apr 10, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

larryro added 4 commits April 10, 2026 19:06

fix(platform): replace unsafe type assertions with type guards in bud…

46d34ae

…get editor Use type guard functions (isScopeValue, isPeriodValue) instead of unsafe `as` casts in Select onValueChange handlers to satisfy no-unsafe-type-assertion lint rule.

fix(rag): update search test mock to use embed_query_with_usage

4be04f2

The search service now calls embed_query_with_usage instead of embed_query. Update the test helper and assertion to match.

coderabbitai Bot requested changes Apr 10, 2026

View reviewed changes

fix(crawler): add usage field to embedding response mock

61dfd96

The embedding service now reads response.usage for token tracking. Add the missing usage field to the test mock SimpleNamespace.

larryro merged commit 0a7b077 into main Apr 10, 2026
9 checks passed

larryro deleted the feat/budget-limits-per-user-group branch April 10, 2026 11:19

larryro linked an issue Apr 10, 2026 that may be closed by this pull request

Unable to configure budgets and limits per user/group #1166

Closed

coderabbitai Bot mentioned this pull request Apr 19, 2026

feat(platform): time-based usage analytics in governance #1575

Merged

8 tasks

coderabbitai Bot mentioned this pull request May 4, 2026

feat(platform): governance feedback analytics page #1662

Merged

13 tasks

Conversation

larryro commented Apr 10, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment has been minimized.

coderabbitai Bot commented Apr 10, 2026

Walkthrough

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

larryro commented Apr 10, 2026 •

edited by coderabbitai Bot

Loading