feat(platform): budget limits per user/group#1311
Conversation
…1166) Wire the existing (but previously unused) budget enforcement system into the platform. This enables admins to configure token budgets, cost limits, and request caps per user, team, role, or as org-wide defaults. Key changes: - Security: add admin role checks to governance mutations and usage queries - Fix getModelCost substring collision (gpt-4o-mini matched gpt-4o) - Remove dead estimatedCostEur/estimatedCostUsd schema fields - Make buildPeriodKey deterministic (accept timestamp, no Date() in mutations) - Zero out usage for cached LLM responses to prevent double-counting - Embedding service: add embed_texts_with_usage() capturing response.usage - RAG service: return embedding + LLM token usage in search/generate responses - Crawler service: pass UsageAccumulator to all Vision API calls - Add cost field to provider JSON schema, resolve at model resolution time - Wire incrementUsageLedger into onAgentComplete (per-team ledger rows) - Rewrite checkBudget: independent rule checking, team shared caps, structured errors - Add budget checks to startAgentChat, approval/human_input/location resume paths - Add usage tracking to OpenAI-compat client tool mode - Governance settings page with tabs: System Prompt, Budgets, Usage - Budget editor UI with scope/period/limits configuration - Usage dashboard with per-user token/cost/request breakdown
…s in budget editor Use SearchableSelect with useMembers/useOrgTeams hooks so admins can search by name/email instead of manually entering IDs.
…fields - Max Tokens: placeholder "e.g. 1000000", hint "1M tokens ≈ 750K words" - Max Cost: changed from cents to USD input with auto-conversion, hint "GPT-4o ≈ $10/1M tokens" - Max Requests: placeholder "e.g. 500", hint about monthly quota
Rules displayed in a clean table with scope, target, period, limits. Add/edit via FormDialog with searchable user/team pickers and helpful placeholder hints. More consistent with other settings pages.
…ull-width table - Toggle, add/edit rule, and delete all auto-save immediately - Remove page-level Save button (no longer needed) - Remove max-w-4xl constraint so table fills available width
…age table Show Total Tokens, Total Cost, Total Requests, Active Users as compact stat cards between the heading and the per-user table.
…rovider cost to usage tracking - Add Cost column to models table in AI Providers settings (input/output USD per 1M tokens) - Add cost input fields to model add/edit dialog with helpful placeholders - Thread providerCost from resolved model data through generateAgentResponse to onAgentComplete - estimateCostCents now uses provider cost when available, falls back to hardcoded map - Revert DEFAULT_COST to mid-range (200/800) since provider config is the intended source of truth
…o cost tracking - Add cost data to all 18 models in openrouter.json (from OpenRouter pricing) - Add Cost column and input fields to AI Providers models table/dialog - Thread providerCost from model resolution through to onAgentComplete - estimateCostCents uses provider cost when available, hardcoded map as fallback - Revert DEFAULT_COST to mid-range since provider config is the source of truth Pricing sources: openrouter.ai model pages (April 2026)
…ssage cost tracking - Add costEstimateCents field to messageMetadata schema - Compute cost once in onAgentComplete, use for both metadata save and ledger increment - Enables per-message cost display in message info dialog
…hboard Use toPrecision(3) for costs < $1 to preserve significant digits. e.g. $0.0893 instead of $0.09, $0.000682 instead of $0.00. Costs >= $1 still show 2 decimal places.
The query validator was missing the new field, causing ReturnsValidationError when reading metadata for messages that have cost data.
Display cost with full precision (toPrecision(3) for small values) in the message information popup, between Performance and Tool Calls.
The field was returned by the query but not mapped in the hook's explicit field selection, so it was always undefined in the UI.
… places in cents Round to 4 decimal places instead of 2 to avoid losing significant digits on small costs. e.g. 0.40275 cents now stored as 0.4028 instead of 0.40, matching OpenRouter's displayed cost more closely.
RAG search and web tools now return { success, response, usage, model }
objects instead of plain strings. This allows extractToolCallsFromSteps
to capture embedding/vision token usage in toolsUsage, which then gets
recorded in the usage ledger via onAgentComplete.
- rag_search_tool: includes embedding token count and model from RAG service
- web_tool: includes vision/LLM token usage from crawler service
- WebFetchUrlResult/WebFetchExtractApiResponse: add usage field
- fetch_and_extract: pass through usage from crawler response
…rror toast When budget is exceeded, save the limit message as an assistant reply in the thread instead of throwing an error that causes "Failed to send message". The user message is preserved and they see a clear explanation of the budget limit in the conversation.
…xceeded The thread was set to 'generating' before the budget check, causing a permanent "Thinking..." state when the budget message was saved. Now properly resets to 'idle' so the UI shows the budget message and allows further interaction.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
This comment has been minimized.
This comment has been minimized.
…toolsUsage - Add costEstimateCents to toolUsageItemValidator schema - Compute cost in extractToolCallsFromSteps using estimateCostCents - Add costEstimateCents to frontend ToolUsage interface - Display cost in ToolCallCard in message info dialog - Now rag_search, web, and delegation tools show their individual cost
…get editor Use type guard functions (isScopeValue, isPeriodValue) instead of unsafe `as` casts in Select onValueChange handlers to satisfy no-unsafe-type-assertion lint rule.
… definition The schema.ts had inline tool usage object definitions separate from the validators.ts toolUsageItemValidator. The new costEstimateCents field was added to the validator but not the schema, causing Convex to reject documents with the extra field at write time.
The search service now calls embed_query_with_usage instead of embed_query. Update the test helper and assertion to match.
📝 WalkthroughWalkthroughThis pull request introduces comprehensive token usage and cost tracking infrastructure across multiple services. It adds usage dataclasses and result types to the embedding service, extends model definitions with cost metadata (input/output cents per million tokens), implements budget enforcement checks before agent generation, tracks aggregated usage in a ledger indexed by organization/user/period, updates governance UIs to display costs and usage dashboards, and propagates usage information through generation, extraction, and search pipelines. The changes touch embedding services, crawler extraction flows, RAG search/generation, agent completion handlers, governance enforcement, and chat platform UI components. Estimated code review effort🎯 4 (Complex) | ⏱️ ~70 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 21
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (8)
services/crawler/app/routers/web.py (1)
123-132:⚠️ Potential issue | 🟠 MajorPreserve parser
usagewhen a file parse fails.The success path forwards
result.get("usage"), but the failure return drops it. Ifparse_file_with_vision()spends OCR/LLM tokens and then returnssuccess=False, the budget ledger will record zero cost for a non-zero spend.Suggested fix
if not result.get("success"): return WebFetchExtractResponse( success=False, url=url_str, content="", content_type=actual_ct or content_type or "", word_count=0, page_count=0, error=result.get("error", "Failed to extract content from file"), + usage=result.get("usage"), )Also applies to: 143-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/crawler/app/routers/web.py` around lines 123 - 132, The failure branches that return a WebFetchExtractResponse omit the parser usage, causing billing/ledger underreporting; update the negative-return paths (the one inside the block that checks if not result.get("success") and the analogous failure block later) to include usage=result.get("usage", {}) so the response preserves parse_file_with_vision() usage info; locate the WebFetchExtractResponse constructors in this router and add the usage field populated from result.get("usage", {}) to each failure return.services/platform/convex/agent_tools/rag/rag_search_tool.ts (1)
151-156:⚠️ Potential issue | 🟠 MajorNormalize
list_indexedto the structured tool response shape.This branch still returns
AgentIndexedDocumentListResultdirectly, while thesearchbranch now returns{ success, response, output, usage? }. Any caller consuming the new structured shape will handlelist_indexeddifferently or drop it entirely.Suggested fix
if (args.operation === 'list_indexed') { - return listIndexedDocuments(ctx, { + const result = await listIndexedDocuments(ctx, { limit: args.limit, cursor: args.cursor, }); + return { + success: true, + response: JSON.stringify(result, null, 2), + output: result, + }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/rag/rag_search_tool.ts` around lines 151 - 156, The 'list_indexed' branch in the execute function returns an AgentIndexedDocumentListResult directly rather than the standardized tool response shape; update the execute handler so when args.operation === 'list_indexed' it calls listIndexedDocuments(ctx, {...}) and then returns an object matching the search branch shape (e.g. { success: true, response: <the list result>, output: <optional human-readable summary>, usage: <optional usage info> }) instead of the raw AgentIndexedDocumentListResult so callers receive the consistent { success, response, output, usage? } structure.services/platform/convex/agent_tools/web/web_tool.ts (1)
77-84:⚠️ Potential issue | 🟠 MajorForward the extraction instruction for webpage and inline-URL fetches.
This condition only passes
instructionfor explicit file URLs. Fetches likequery: "Extract pricing from https://example.com/pricing"or explicit HTML URL fetches will ignore the user's prompt and fall back to raw extraction, even though the crawler supportsinstructionfor those paths too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/web/web_tool.ts` around lines 77 - 84, The code only sets instruction for file URLs; update the instruction calculation so user-provided extraction prompts (args.query) are forwarded for webpage and inline-URL fetches as well: compute instruction = args.query when present (not just when isFileUrl(targetUrl)) and pass that instruction into the fetchAndExtract call (function fetchAndExtract in this file), preserving any existing file-URL logic that may still apply.services/platform/convex/agent_tools/approval_shared.ts (1)
59-90:⚠️ Potential issue | 🟠 MajorHandle budget denial as a normal return, not a thrown mutation error.
This helper is running inside a mutation path. Throwing on
!budgetResult.allowedrolls back the completion message and bubbles an error to the caller, which regresses the “assistant message, not error” budget UX. Check the budget beforecreateStream()/generationStatus, then save the budget notice and return.💡 Suggested fix
- const streamId = await persistentStreaming.createStream(ctx); - - if (threadMeta) { - await ctx.db.patch(threadMeta._id, { - generationStatus: 'generating' as const, - streamId, - updatedAt: Date.now(), - }); - } - // Budget enforcement — block before scheduling if limits are exceeded const userId = thread?.userId; if (userId) { const budgetResult = await checkBudget( ctx, @@ userId, agentConfig.agentTeamId ? [agentConfig.agentTeamId] : [], ); if (!budgetResult.allowed) { - throw new Error(budgetResult.reason ?? 'Budget limit exceeded'); + await saveMessage(ctx, components.agent, { + threadId, + message: { + role: 'assistant', + content: budgetResult.reason ?? 'Budget limit exceeded', + }, + }); + return; } } + + const streamId = await persistentStreaming.createStream(ctx); + + if (threadMeta) { + await ctx.db.patch(threadMeta._id, { + generationStatus: 'generating' as const, + streamId, + updatedAt: Date.now(), + }); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/approval_shared.ts` around lines 59 - 90, The budget check currently throws an error after creating the stream and setting generationStatus, causing a rollback; move the call to checkBudget (the check of checkBudget(...) result for userId) to before persistentStreaming.createStream() and before patching threadMeta.generationStatus, and when budgetResult.allowed is false do NOT throw—use saveMessage(...) to save a user-visible budget denial/notice message (similar to how promptMessageId is created) and return early so no stream is created and no generationStatus is set; reference persistentStreaming.createStream, saveMessage, checkBudget, threadMeta and generationStatus when making the change.services/platform/convex/lib/agent_response/generate_response.ts (1)
1807-1834:⚠️ Potential issue | 🟠 MajorPropagate
providerCoston the failure metadata path too.Success and cancel completions now use resolved provider pricing, but the error path still omits it. Any failed generation that reports token usage will fall back to the hardcoded model map here, so
costEstimateCentsand ledger totals diverge from the success path.💡 Suggested fix
await onAgentComplete(ctx, { threadId, agentType, result: { threadId, @@ organizationId, userId, teamIds, agentSlug, + providerCost, });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/lib/agent_response/generate_response.ts` around lines 1807 - 1834, The error completion path in generate_response.ts calls onAgentComplete with a result object that omits providerCost, causing divergence from success/cancel paths; update the result passed to onAgentComplete (the object constructed in the await onAgentComplete(...) call) to include providerCost (and/or costEstimateCents if you use that symbol) populated the same way as the success/cancel completion paths so failed generations that have token usage use the resolved provider pricing; locate the onAgentComplete invocation in generate_response.ts and add providerCost: providerCost (or the exact variable name used where resolved pricing is computed) to the result object along with any existing usage/duration fields.services/platform/convex/agent_tools/human_input/mutations.ts (1)
169-205:⚠️ Potential issue | 🟠 MajorDon’t throw on budget denial after marking the approval complete.
This runs inside the same mutation after the approval has already been patched to
completed. Throwing here aborts the whole transaction, so the human response is lost and the caller gets an error instead of the intended assistant budget-exceeded message. The budget gate needs to run beforecreateStream()/generationStatus, then save the budget notice and return cleanly.💡 Suggested fix
- const streamId = await persistentStreaming.createStream(ctx); - - // Set generationStatus so the frontend shows loading indicator - const threadMeta = await ctx.db - .query('threadMetadata') - .withIndex('by_threadId', (q) => q.eq('threadId', threadId)) - .first(); - if (threadMeta) { - await ctx.db.patch(threadMeta._id, { - generationStatus: 'generating' as const, - streamId, - updatedAt: Date.now(), - }); - } - const thread = await ctx.runQuery(components.agent.threads.getThread, { threadId, }); @@ if (userId) { const budgetResult = await checkBudget( ctx, organizationId, userId, agentConfig.agentTeamId ? [agentConfig.agentTeamId] : [], ); if (!budgetResult.allowed) { - throw new Error(budgetResult.reason ?? 'Budget limit exceeded'); + await saveMessage(ctx, components.agent, { + threadId, + message: { + role: 'assistant', + content: budgetResult.reason ?? 'Budget limit exceeded', + }, + }); + return { success: true, threadId }; } } + + const streamId = await persistentStreaming.createStream(ctx); + + // Set generationStatus so the frontend shows loading indicator + const threadMeta = await ctx.db + .query('threadMetadata') + .withIndex('by_threadId', (q) => q.eq('threadId', threadId)) + .first(); + if (threadMeta) { + await ctx.db.patch(threadMeta._id, { + generationStatus: 'generating' as const, + streamId, + updatedAt: Date.now(), + }); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/agent_tools/human_input/mutations.ts` around lines 169 - 205, The budget check is happening after createStream()/setting generationStatus so throwing on denial aborts the transaction and loses the completed approval; move the budget enforcement (the checkBudget call and its handling) to run before calling persistentStreaming.createStream and before patching threadMetadata.generationStatus/streamId, and if not allowed do not throw—persist a budget-notice (e.g. save a record or patch the thread/response with a budgetExceeded flag/message) and return cleanly with the assistant budget-exceeded response instead of raising an error; update references to checkBudget, persistentStreaming.createStream, and the threadMetadata patch to reflect the new order.services/platform/convex/lib/agent_chat/start_agent_chat.ts (1)
132-154:⚠️ Potential issue | 🟠 MajorBudget denials still leave an orphan generation session behind.
createStream()and thethreadMetadatatransition togeneratinghappen before the budget check. When the check fails, we only flipgenerationStatusback toidle, so the thread still points at a stream/generation start that will never produce output. That can leave the client subscribed to a dead stream and pollute thread state.Either move the budget check ahead of stream creation, or explicitly clear the stream-related fields on the deny path.
Also applies to: 217-240
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/convex/lib/agent_chat/start_agent_chat.ts` around lines 132 - 154, The code creates a persistent stream via persistentStreaming.createStream and patches threadMetadata.generationStatus to "generating" before performing the budget check, which leaves an orphaned stream on denial; fix by moving the budget validation to run before calling persistentStreaming.createStream and before setting threadMetadata to generating, or if you prefer to keep the current order, ensure the deny path explicitly clears the stream-related fields (e.g., set streamId = undefined, generationStartTime = undefined, and reset any stream/generation markers) by calling ctx.db.patch on the same threadMetadata record whenever budget is denied; apply the same change to the other occurrence that updates threadMetadata and creates streams (the block around the second createStream/patch).services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx (1)
523-603:⚠️ Potential issue | 🟠 MajorAdd a
<caption>to the models table.This table still has no caption, so screen readers get the columns without any summary of what the table represents.
As per coding guidelines "ENSURE data tables have a
<caption>(can be visually hidden withsr-only)".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@services/platform/app/routes/dashboard/`$id/settings/providers/$providerName.tsx around lines 523 - 603, The models table rendered by the Table component (the block starting with <Table> and the config.models.map loop) lacks a caption for screen readers; add a <caption> as the first child of <Table> (just below the opening <Table>) with a descriptive string (e.g., use the i18n helper t('providers.modelsTableCaption') or a literal like "Provider models and settings") and make it visually hidden with the sr-only class if you don't want it visible; ensure the caption text clearly summarizes the table's purpose so screen readers receive context for the columns.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/providers/openrouter.json`:
- Around line 16-137: The example contains non-existent/unverified OpenRouter
model entries (e.g. ids "openai/gpt-5.2", "openai/gpt-5.2-pro",
"anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6",
"google/gemini-3.1-pro-preview", "deepseek/deepseek-v3.2",
"qwen/qwen3-next-80b-a3b-instruct", "meta-llama/llama-4-maverick", etc.) with
hardcoded prices; either replace each of these model objects with actual
verified OpenRouter model ids and current pricing from the OpenRouter catalog,
or mark the entire file as a template by adding a clear machine-readable flag
(e.g. add a top-level "template": true or add "verified": false to each model
object) and update displayName/description to indicate placeholder pricing so
production budget enforcement won’t use these values.
In `@services/platform/app/features/chat/components/message-info-dialog.tsx`:
- Around line 232-248: The "Cost" label is hardcoded in the metadata stat block
(the object created when metadata?.costEstimateCents != null) and must be
localized; replace the literal 'Cost' with the translation call from the
existing hook (e.g., useT or t) used elsewhere in this component (call
t('messageInfo.cost') or the appropriate key) so the label uses translated text;
update the object for the cost stat to use t('messageInfo.cost') (or the
component's existing translation function) instead of the hardcoded string.
In
`@services/platform/app/features/settings/governance/components/budget-editor.tsx`:
- Around line 469-495: The table in the BudgetEditor component is missing a
<caption> and column semantics on headers; update the table markup in
budget-editor.tsx (the table inside the overflow-x-auto div) to add a caption
(use an sr-only class if it should be visually hidden) and add scope="col"
attributes to each <th> (e.g., the Scope, Target, Period, Max Tokens, Max Cost,
Max Requests, Actions headers) so screen readers get correct column semantics.
- Around line 34-52: The UI strings in SCOPE_OPTIONS, ROLE_OPTIONS,
PERIOD_OPTIONS and all other user-facing text and aria-labels in
budget-editor.tsx are hardcoded English; replace them with calls to the
translation hook (useT / t) and translation keys (e.g.,
t('governance.budget.scope.default')) instead of raw strings, and ensure
aria-label attributes use translation keys as well (e.g.,
aria-label={t('governance.budget.scope.aria')}); scan the rest of the component
(lines referenced ~118-297, 455-572) and convert placeholders, helper copy,
button text, and ARIA values to use the t() function with meaningful keys and
update/add corresponding entries in the locale files.
- Around line 357-418: The editor mutates local enabled/rules state immediately
(in handleToggleEnabled, removeRule, handleDialogSave,
openAddDialog/openEditDialog flows) but doesn’t revert on upsertMutation
failure, leaving UI out of sync; change these handlers to either (A) perform the
upsertMutation.mutateAsync first and only call setEnabled/setRules after it
resolves successfully, or (B) perform an optimistic update but capture previous
state, call upsertMutation.mutateAsync, and on catch revert to the captured
previous state and show the toast; reference
saveConfig/upsertMutation.mutateAsync and the local setters setEnabled/setRules
and handlers handleToggleEnabled, removeRule, handleDialogSave (and
openAddDialog/openEditDialog flows) so you update or revert state consistently
when the mutation fails.
In
`@services/platform/app/features/settings/governance/components/usage-dashboard.tsx`:
- Around line 149-150: The component in usage-dashboard.tsx contains hardcoded
UI strings; replace them with i18n keys by importing and using the existing
translation hook (e.g. useTranslation or t) in the component that renders the
<Text as="h3" variant="label"> and the usage stat labels. Swap "Usage" for a
translation key like t('settings.usage.title'), replace "Total Tokens", "Total
Cost", "Total Requests", "Active Users" with keys such as
t('settings.usage.totalTokens'), t('settings.usage.totalCost'),
t('settings.usage.totalRequests'), t('settings.usage.activeUsers'), and replace
the empty state strings "No usage data" and "No usage has been recorded for this
period." with t('settings.usage.empty.title') and
t('settings.usage.empty.description'); ensure the translation hook is imported
at the top of the component and all uses reference the t(...) function instead
of raw strings so the component (e.g. UsageDashboard or the exported component
in this file) is fully internationalized.
- Around line 82-141: Replace hardcoded English column headers and inline labels
in the columns array with translated strings using the project's translation
hook (e.g., call useTranslation()/t) inside the UsageDashboard component: import
and call the hook at the top of the component, then change header values
('User', 'Team') and header render functions (e.g., () => <div
className="text-right">Input Tokens</div>) to use t('...') keys (e.g.,
t('governance.user'), t('governance.inputTokens'), t('governance.outputTokens'),
t('governance.cost'), t('governance.requests')) and replace the inline team
empty label '\u2014' if it should be localized; update any label variants (Text
variant props remain) so all user-facing strings use t(...) consistently.
- Around line 45-48: Replace the direct use of Date.prototype.toLocaleDateString
for the label with the shared formatDate utility: import formatDate from
'lib/utils/date/format', then change the assignment of label (the variable named
label that formats the date variable d) to call formatDate(d, ...) using the
equivalent options for long month and numeric year so the output matches the
previous behavior; ensure the import is added at the top of the file and remove
the toLocaleDateString call.
In `@services/platform/app/routes/dashboard/`$id/settings/governance.tsx:
- Around line 14-16: Change searchSchema so the tab field is constrained to
allowed values instead of any string: replace z.string().optional() with
z.enum(['system-prompt','roles','policies']).optional() (or whatever exact
supported tabs are) and update validateSearch to fallback to 'system-prompt'
when tab is missing or invalid; locate the schema named searchSchema and the
validateSearch logic in this file and ensure the parsed value uses the enum
result or defaults to 'system-prompt'.
- Around line 45-64: tabItems currently hardcodes English labels inside the
useMemo array (values 'System Prompt', 'Budgets', 'Usage'); replace those
hardcoded strings with translated labels using the app's translation hook (e.g.,
call const { t } = useTranslation() or useTranslations() at top of the
component) and use t('settings.governance.systemPrompt'),
t('settings.governance.budgets'), t('settings.governance.usage') (or the
project's preferred keys) for the label fields so SystemPromptEditor,
BudgetEditor and UsageDashboard render localized tab labels; keep the value and
content entries the same and ensure the translation hook is included in the
useMemo dependency array.
In
`@services/platform/app/routes/dashboard/`$id/settings/providers/$providerName.tsx:
- Around line 462-484: In handleSubmitModel, validate form.inputCostPerMillion
and form.outputCostPerMillion before converting them into the cost object: parse
each value to a Number, ensure it is finite and >= 0, and reject or surface a
validation error (prevent submission) if not; only then compute
inputCentsPerMillion and outputCentsPerMillion via Math.round and include cost
in model (otherwise set cost to undefined or omit the invalid field). Reference
the form fields (form.inputCostPerMillion, form.outputCostPerMillion), the cost
construction logic and the model variable to locate and update the serialization
and add the validation check.
- Around line 535-537: The new provider-cost column and related UI strings are
hardcoded English; replace all user-facing text in TableHead ("Cost / 1M
tokens"), the field labels/placeholders/helper texts and the rendered cost copy
referenced around the TableHead and the ranges noted (including the blocks
currently around lines 579-587 and 687-719) with i18n translations: import and
use the existing translation hook/function (e.g., useTranslations or t) in this
component, replace string literals with t('...') calls using meaningful keys
(e.g., "providers.costPerMillion", "providers.costPlaceholder",
"providers.costHelper", etc.), and add corresponding keys to the locale resource
files so the UI reads from translations rather than hardcoded English.
In `@services/platform/convex/agent_tools/location/mutations.ts`:
- Around line 159-171: The budget check (checkBudget) can throw after the
thread's stream is created and generationStatus is set to 'generating', leaving
the thread stuck; move the call to checkBudget (using ctx, organizationId,
userId, and agentConfig.agentTeamId) to run before creating the stream and
before updating thread.generationStatus to 'generating', or if you prefer to
keep ordering, catch the budget failure and explicitly revert the thread
record's generationStatus back to 'idle' and persist the budget-exceeded
assistant message before rethrowing; ensure the fix touches the flow that
creates the stream and sets generationStatus so no path can leave the thread in
'generating' without scheduling a generation.
In `@services/platform/convex/governance/budget_enforcement.ts`:
- Around line 92-112: The code currently calls
ctx.db.query('usageLedger').withIndex('by_org_team_period', ...) .collect() into
entries and then loops to sum into totals; change this to iterate the query
result with a for await loop instead of .collect(): obtain the async iterator
from ctx.db.query(...).withIndex(...) and use for await (const entry of ...) to
increment totals.totalTokens, totals.costEstimate, and totals.requestCount (the
UsageTotals object), preserving the same filter keys (organizationId, teamId,
periodKey) used in the withIndex call.
In `@services/platform/convex/governance/internal_mutations.ts`:
- Around line 68-84: The reconciliation loop uses the original keep object's
totals for every patch so merging multiple duplicates loses intermediate counts;
update the running totals before each patch (or compute cumulative sums first)
so each subsequent merge adds to the previously-merged value—e.g., use mutable
accumulators (like let accInput = keep.inputTokens, accOutput =
keep.outputTokens, accTotal = keep.totalTokens, accCost = keep.costEstimate,
accCount = keep.requestCount) and inside the for-loop add dup's values to these
accumulators and call ctx.db.patch(keep._id, { inputTokens: accInput,
outputTokens: accOutput, totalTokens: accTotal, costEstimate: accCost,
requestCount: accCount }), then ctx.db.delete(dup._id); alternatively refresh
keep after each patch by re-reading it from ctx.db before the next iteration.
In `@services/platform/convex/governance/mutations.ts`:
- Around line 35-42: The code validates args.config using
budgetConfigSchema.safeParse but then continues to use the raw args.config;
change the logic in the mutation so that after calling
budgetConfigSchema.safeParse(args.config) you capture parsed.data (e.g., const
normalizedConfig = parsed.data) and replace all later uses of args.config with
normalizedConfig — including the ctx.db.patch(...) and ctx.db.insert(...) calls
and the audit log entry — so the persisted config and audit record use the
validated/normalized data rather than the original input.
In `@services/platform/convex/openai_compat/internal_actions.ts`:
- Around line 384-405: The ledger entry uses the original modelId to estimate
cost, but resolveLanguageModelWithFallback() may return a different
resolved.modelData with provider-specific pricing; change the usage-cost
calculation to use the resolved model info (resolved.modelData or
resolved.modelId/pricing) when calling estimateCostCents so the
costEstimateCents reflects the actual provider/model chosen, and ensure the
mutation call to internal.governance.internal_mutations.incrementUsageLedger
uses those resolved values (reference resolveLanguageModelWithFallback(),
resolved.modelData, estimateCostCents, incrementUsageLedger, and result.usage to
locate the code).
In `@services/platform/lib/shared/schemas/providers.ts`:
- Around line 14-19: The cost schema currently allows negative values for
inputCentsPerMillion and outputCentsPerMillion; update the cost object schema
(the cost definition) so both inputCentsPerMillion and outputCentsPerMillion are
constrained to non-negative numbers (e.g., use z.number().nonnegative() or
z.number().min(0)) to reject negative pricing at the schema boundary before it
reaches cost estimation.
In `@services/rag/app/services/rag_service.py`:
- Around line 281-284: The service currently stores per-request embedding usage
on the singleton via self.last_search_usage (and similar at lines ~301-302);
change the search (and affected generate) methods to stop writing to
self.last_search_usage and instead return the usage as part of the method return
value (e.g., include a usage field in the search result tuple/dict) so callers
receive request-scoped usage directly; update callers of search (and the
generate-related method) to read the returned usage rather than reading
self.last_search_usage.
In `@services/rag/app/services/search_service.py`:
- Around line 61-68: The code currently races _fts_search and
embed_query_with_usage with asyncio.gather so if _fts_search raises the
embedding task's usage result can be lost and last_search_usage remains the
zero-token placeholder; change the flow to always await or retrieve the
embedding result (from _embedding.embed_query_with_usage) independently of
_fts_search failures and assign self.last_search_usage = query_result.usage
before handling BM25/vector-only fallback (the code path that calls
embed_query()); specifically ensure the embedding_task started in this block is
awaited (or its result retrieved) even if the fts_task raises, so
last_search_usage reflects the actual usage for later fallback logic that relies
on self.last_search_usage.
---
Outside diff comments:
In `@services/crawler/app/routers/web.py`:
- Around line 123-132: The failure branches that return a
WebFetchExtractResponse omit the parser usage, causing billing/ledger
underreporting; update the negative-return paths (the one inside the block that
checks if not result.get("success") and the analogous failure block later) to
include usage=result.get("usage", {}) so the response preserves
parse_file_with_vision() usage info; locate the WebFetchExtractResponse
constructors in this router and add the usage field populated from
result.get("usage", {}) to each failure return.
In
`@services/platform/app/routes/dashboard/`$id/settings/providers/$providerName.tsx:
- Around line 523-603: The models table rendered by the Table component (the
block starting with <Table> and the config.models.map loop) lacks a caption for
screen readers; add a <caption> as the first child of <Table> (just below the
opening <Table>) with a descriptive string (e.g., use the i18n helper
t('providers.modelsTableCaption') or a literal like "Provider models and
settings") and make it visually hidden with the sr-only class if you don't want
it visible; ensure the caption text clearly summarizes the table's purpose so
screen readers receive context for the columns.
In `@services/platform/convex/agent_tools/approval_shared.ts`:
- Around line 59-90: The budget check currently throws an error after creating
the stream and setting generationStatus, causing a rollback; move the call to
checkBudget (the check of checkBudget(...) result for userId) to before
persistentStreaming.createStream() and before patching
threadMeta.generationStatus, and when budgetResult.allowed is false do NOT
throw—use saveMessage(...) to save a user-visible budget denial/notice message
(similar to how promptMessageId is created) and return early so no stream is
created and no generationStatus is set; reference
persistentStreaming.createStream, saveMessage, checkBudget, threadMeta and
generationStatus when making the change.
In `@services/platform/convex/agent_tools/human_input/mutations.ts`:
- Around line 169-205: The budget check is happening after
createStream()/setting generationStatus so throwing on denial aborts the
transaction and loses the completed approval; move the budget enforcement (the
checkBudget call and its handling) to run before calling
persistentStreaming.createStream and before patching
threadMetadata.generationStatus/streamId, and if not allowed do not
throw—persist a budget-notice (e.g. save a record or patch the thread/response
with a budgetExceeded flag/message) and return cleanly with the assistant
budget-exceeded response instead of raising an error; update references to
checkBudget, persistentStreaming.createStream, and the threadMetadata patch to
reflect the new order.
In `@services/platform/convex/agent_tools/rag/rag_search_tool.ts`:
- Around line 151-156: The 'list_indexed' branch in the execute function returns
an AgentIndexedDocumentListResult directly rather than the standardized tool
response shape; update the execute handler so when args.operation ===
'list_indexed' it calls listIndexedDocuments(ctx, {...}) and then returns an
object matching the search branch shape (e.g. { success: true, response: <the
list result>, output: <optional human-readable summary>, usage: <optional usage
info> }) instead of the raw AgentIndexedDocumentListResult so callers receive
the consistent { success, response, output, usage? } structure.
In `@services/platform/convex/agent_tools/web/web_tool.ts`:
- Around line 77-84: The code only sets instruction for file URLs; update the
instruction calculation so user-provided extraction prompts (args.query) are
forwarded for webpage and inline-URL fetches as well: compute instruction =
args.query when present (not just when isFileUrl(targetUrl)) and pass that
instruction into the fetchAndExtract call (function fetchAndExtract in this
file), preserving any existing file-URL logic that may still apply.
In `@services/platform/convex/lib/agent_chat/start_agent_chat.ts`:
- Around line 132-154: The code creates a persistent stream via
persistentStreaming.createStream and patches threadMetadata.generationStatus to
"generating" before performing the budget check, which leaves an orphaned stream
on denial; fix by moving the budget validation to run before calling
persistentStreaming.createStream and before setting threadMetadata to
generating, or if you prefer to keep the current order, ensure the deny path
explicitly clears the stream-related fields (e.g., set streamId = undefined,
generationStartTime = undefined, and reset any stream/generation markers) by
calling ctx.db.patch on the same threadMetadata record whenever budget is
denied; apply the same change to the other occurrence that updates
threadMetadata and creates streams (the block around the second
createStream/patch).
In `@services/platform/convex/lib/agent_response/generate_response.ts`:
- Around line 1807-1834: The error completion path in generate_response.ts calls
onAgentComplete with a result object that omits providerCost, causing divergence
from success/cancel paths; update the result passed to onAgentComplete (the
object constructed in the await onAgentComplete(...) call) to include
providerCost (and/or costEstimateCents if you use that symbol) populated the
same way as the success/cancel completion paths so failed generations that have
token usage use the resolved provider pricing; locate the onAgentComplete
invocation in generate_response.ts and add providerCost: providerCost (or the
exact variable name used where resolved pricing is computed) to the result
object along with any existing usage/duration fields.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 542660b9-0cc0-4eea-a4b0-96e5bcc2558d
📒 Files selected for processing (43)
examples/providers/openrouter.jsonpackages/tale_knowledge/src/tale_knowledge/embedding/__init__.pypackages/tale_knowledge/src/tale_knowledge/embedding/service.pyservices/crawler/app/models.pyservices/crawler/app/routers/web.pyservices/crawler/app/services/web_image_extractor.pyservices/platform/app/features/chat/components/message-info-dialog.tsxservices/platform/app/features/chat/hooks/queries.tsservices/platform/app/features/settings/governance/components/budget-editor.tsxservices/platform/app/features/settings/governance/components/usage-dashboard.tsxservices/platform/app/routes/dashboard/$id/settings/governance.tsxservices/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsxservices/platform/convex/agent_tools/approval_shared.tsservices/platform/convex/agent_tools/human_input/mutations.tsservices/platform/convex/agent_tools/location/mutations.tsservices/platform/convex/agent_tools/rag/format_search_results.tsservices/platform/convex/agent_tools/rag/rag_search_tool.tsservices/platform/convex/agent_tools/web/helpers/fetch_and_extract.tsservices/platform/convex/agent_tools/web/helpers/types.tsservices/platform/convex/agent_tools/web/web_tool.tsservices/platform/convex/governance/budget_enforcement.tsservices/platform/convex/governance/cost_estimation.tsservices/platform/convex/governance/helpers.tsservices/platform/convex/governance/internal_mutations.tsservices/platform/convex/governance/mutations.tsservices/platform/convex/governance/queries.tsservices/platform/convex/governance/schema.tsservices/platform/convex/lib/agent_chat/internal_actions.tsservices/platform/convex/lib/agent_chat/start_agent_chat.tsservices/platform/convex/lib/agent_completion/on_agent_complete.tsservices/platform/convex/lib/agent_response/generate_response.tsservices/platform/convex/lib/agent_response/types.tsservices/platform/convex/message_metadata/internal_mutations.tsservices/platform/convex/openai_compat/internal_actions.tsservices/platform/convex/providers/file_actions.tsservices/platform/convex/providers/resolve_model.tsservices/platform/convex/streaming/schema.tsservices/platform/convex/streaming/validators.tsservices/platform/lib/shared/schemas/providers.tsservices/rag/app/models.pyservices/rag/app/routers/search.pyservices/rag/app/services/rag_service.pyservices/rag/app/services/search_service.py
💤 Files with no reviewable changes (1)
- services/platform/convex/governance/schema.ts
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 38, "outputCentsPerMillion": 172 } | ||
| }, | ||
| { | ||
| "id": "deepseek/deepseek-v3.2", | ||
| "displayName": "DeepSeek V3.2", | ||
| "description": "Strong reasoning and general capabilities", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 26, "outputCentsPerMillion": 38 } | ||
| }, | ||
| { | ||
| "id": "qwen/qwen3-next-80b-a3b-instruct", | ||
| "displayName": "Qwen3 Next 80B", | ||
| "description": "Fast and efficient instruction-following model", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 9, "outputCentsPerMillion": 110 } | ||
| }, | ||
| { | ||
| "id": "qwen/qwen3.5-35b-a3b", | ||
| "displayName": "Qwen3.5 35B", | ||
| "description": "Compact and fast model", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 16, "outputCentsPerMillion": 130 } | ||
| }, | ||
| { | ||
| "id": "anthropic/claude-opus-4.6", | ||
| "displayName": "Claude Opus 4.6", | ||
| "description": "Most capable model for complex reasoning and coding", | ||
| "tags": [ | ||
| "chat", | ||
| "vision" | ||
| ] | ||
| "tags": ["chat", "vision"], | ||
| "cost": { "inputCentsPerMillion": 500, "outputCentsPerMillion": 2500 } | ||
| }, | ||
| { | ||
| "id": "openai/gpt-5.2", | ||
| "displayName": "GPT-5.2", | ||
| "description": "OpenAI's latest flagship model", | ||
| "tags": [ | ||
| "chat", | ||
| "vision" | ||
| ] | ||
| "tags": ["chat", "vision"], | ||
| "cost": { "inputCentsPerMillion": 175, "outputCentsPerMillion": 1400 } | ||
| }, | ||
| { | ||
| "id": "qwen/qwen3-vl-32b-instruct", | ||
| "displayName": "Qwen3 VL 32B", | ||
| "description": "Vision-language model for image understanding", | ||
| "tags": [ | ||
| "chat", | ||
| "vision" | ||
| ] | ||
| "tags": ["chat", "vision"], | ||
| "cost": { "inputCentsPerMillion": 10, "outputCentsPerMillion": 42 } | ||
| }, | ||
| { | ||
| "id": "qwen/qwen3-embedding-8b", | ||
| "displayName": "Qwen3 Embedding 8B", | ||
| "description": "Text embedding model for semantic search", | ||
| "tags": [ | ||
| "embedding" | ||
| ], | ||
| "dimensions": 1536 | ||
| "tags": ["embedding"], | ||
| "dimensions": 1536, | ||
| "cost": { "inputCentsPerMillion": 1, "outputCentsPerMillion": 0 } | ||
| }, | ||
| { | ||
| "id": "openai/gpt-5.2-chat", | ||
| "displayName": "GPT-5.2 Instant", | ||
| "description": "Fast, low-latency variant of GPT-5.2 optimized for chat", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 175, "outputCentsPerMillion": 1400 } | ||
| }, | ||
| { | ||
| "id": "openai/gpt-5.2-pro", | ||
| "displayName": "GPT-5.2 Pro", | ||
| "description": "Most advanced GPT-5.2 variant for complex reasoning tasks", | ||
| "tags": [ | ||
| "chat", | ||
| "vision" | ||
| ] | ||
| "tags": ["chat", "vision"], | ||
| "cost": { "inputCentsPerMillion": 2100, "outputCentsPerMillion": 16800 } | ||
| }, | ||
| { | ||
| "id": "anthropic/claude-sonnet-4.6", | ||
| "displayName": "Claude Sonnet 4.6", | ||
| "description": "Balanced performance and speed for everyday tasks", | ||
| "tags": [ | ||
| "chat", | ||
| "vision" | ||
| ] | ||
| "tags": ["chat", "vision"], | ||
| "cost": { "inputCentsPerMillion": 300, "outputCentsPerMillion": 1500 } | ||
| }, | ||
| { | ||
| "id": "anthropic/claude-haiku-4.5", | ||
| "displayName": "Claude Haiku 4.5", | ||
| "description": "Fast and compact model for lightweight tasks", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 100, "outputCentsPerMillion": 500 } | ||
| }, | ||
| { | ||
| "id": "google/gemini-3.1-pro-preview", | ||
| "displayName": "Gemini 3 Pro", | ||
| "description": "Google's most capable Gemini model", | ||
| "tags": [ | ||
| "chat", | ||
| "vision" | ||
| ] | ||
| "tags": ["chat", "vision"], | ||
| "cost": { "inputCentsPerMillion": 200, "outputCentsPerMillion": 1200 } | ||
| }, | ||
| { | ||
| "id": "google/gemini-3-flash-preview", | ||
| "displayName": "Gemini 3 Flash", | ||
| "description": "Fast and efficient Gemini model", | ||
| "tags": [ | ||
| "chat", | ||
| "vision" | ||
| ] | ||
| "tags": ["chat", "vision"], | ||
| "cost": { "inputCentsPerMillion": 50, "outputCentsPerMillion": 300 } | ||
| }, | ||
| { | ||
| "id": "mistralai/mistral-large-2512", | ||
| "displayName": "Mistral Large 3", | ||
| "description": "Mistral AI's flagship large language model", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 50, "outputCentsPerMillion": 150 } | ||
| }, | ||
| { | ||
| "id": "mistralai/mistral-medium-3", | ||
| "displayName": "Mistral Medium 3", | ||
| "description": "Balanced Mistral model for general tasks", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 40, "outputCentsPerMillion": 200 } | ||
| }, | ||
| { | ||
| "id": "meta-llama/llama-4-maverick", | ||
| "displayName": "LLaMA 4 Maverick", | ||
| "description": "Meta's powerful open-source large language model", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 15, "outputCentsPerMillion": 60 } | ||
| }, | ||
| { | ||
| "id": "meta-llama/llama-4-scout", | ||
| "displayName": "LLaMA 4 Scout", | ||
| "description": "Meta's efficient open-source language model", | ||
| "tags": [ | ||
| "chat" | ||
| ] | ||
| "tags": ["chat"], | ||
| "cost": { "inputCentsPerMillion": 8, "outputCentsPerMillion": 30 } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What are the current OpenRouter input and output prices per 1M tokens for the models listed in examples/providers/openrouter.json?
💡 Result:
Web search failed: Server error: no LLM provider could handle the message
🏁 Script executed:
# First, let's look at the full file to understand the structure
fd openrouter.jsonRepository: tale-project/tale
Length of output: 95
🏁 Script executed:
# Read the file to see all models and their pricing
cat examples/providers/openrouter.jsonRepository: tale-project/tale
Length of output: 5454
🌐 Web query:
OpenRouter API pricing 2025 current rates
💡 Result:
As of April 10, 2026, OpenRouter’s “current rates” break down into (a) model token rates + (b) OpenRouter fees:
-
Model (token) pricing: You pay the posted per‑model rates per 1M input tokens and per 1M output tokens shown in the OpenRouter model catalog; OpenRouter says it does not mark up provider pricing. [1]
- Example (one model): GPT‑4.1 shows $2 / 1M input tokens and $8 / 1M output tokens (effective pricing page). [2]
-
Pay‑as‑you‑go platform fee (when buying credits): 5.5% of the order amount for non‑crypto payments ($0.80 minimum fee); 5.0% flat for crypto (no minimum). [3]
-
BYOK (Bring Your Own Key): 1M free BYOK requests per month, then 5% fee after that. [1][4]
Sources: OpenRouter Pricing page + FAQ + fee announcement + model pricing page. [1][4][3][2]
🌐 Web query:
OpenRouter documentation model pricing rates per million tokens
💡 Result:
OpenRouter exposes per-token USD pricing for each model in the Models API response under pricing.prompt (input tokens) and pricing.completion (output tokens). To get rates per 1M tokens, multiply those values by 1,000,000. [1]
Endpoint
GET https://openrouter.ai/api/v1/models(returns model list +pricing) [1]
Examples (from the API response)
anthropic/claude-opus-4.6-fast:prompt: 0.00003,completion: 0.00015⇒ $30 / 1M input, $150 / 1M output [1]z-ai/glm-5.1:prompt: 0.00000126,completion: 0.00000396⇒ $1.26 / 1M input, $3.96 / 1M output [1]google/gemma-4-26b-a4b-it:prompt: 0.00000013,completion: 0.0000004⇒ $0.13 / 1M input, $0.40 / 1M output [1]
OpenRouter’s docs also note that pricing is shown per million tokens (often separate prompt vs completion) on the Models page, and that prices are passed through from underlying providers (no inference markup). [2]
Sources: [1] [2]
Several models in this example file do not appear to exist on OpenRouter, making the pricing values unverifiable and unsuitable for production budget enforcement.
The file lists future-dated or fictional model versions (GPT-5.2, Claude 4.6, LLaMA 4, Gemini 3, DeepSeek V3.2, Qwen3 variants) that are not available in the current OpenRouter catalog. Using prices tied to non-existent models will produce incorrect cost estimates and budget calculations. Either use verified models that exist on OpenRouter with their actual current pricing, or clearly document this as a template/placeholder example.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/providers/openrouter.json` around lines 16 - 137, The example
contains non-existent/unverified OpenRouter model entries (e.g. ids
"openai/gpt-5.2", "openai/gpt-5.2-pro", "anthropic/claude-opus-4.6",
"anthropic/claude-sonnet-4.6", "google/gemini-3.1-pro-preview",
"deepseek/deepseek-v3.2", "qwen/qwen3-next-80b-a3b-instruct",
"meta-llama/llama-4-maverick", etc.) with hardcoded prices; either replace each
of these model objects with actual verified OpenRouter model ids and current
pricing from the OpenRouter catalog, or mark the entire file as a template by
adding a clear machine-readable flag (e.g. add a top-level "template": true or
add "verified": false to each model object) and update displayName/description
to indicate placeholder pricing so production budget enforcement won’t use these
values.
| ...(metadata?.costEstimateCents != null | ||
| ? [ | ||
| { | ||
| label: 'Cost', | ||
| value: ( | ||
| <Text className="font-mono"> | ||
| {(() => { | ||
| const dollars = metadata.costEstimateCents / 100; | ||
| if (dollars === 0) return '$0.00'; | ||
| if (dollars >= 1) return `$${dollars.toFixed(2)}`; | ||
| return `$${dollars.toPrecision(3)}`; | ||
| })()} | ||
| </Text> | ||
| ), | ||
| }, | ||
| ] | ||
| : []), |
There was a problem hiding this comment.
Localize the new cost label.
'Cost' is new user-facing text but it bypasses useT, so this stat will stay English in localized UIs. Please move it to a translation key, e.g. t('messageInfo.cost').
As per coding guidelines **/*.{tsx,jsx}: Do NOT hardcode text, use the translation hooks/functions instead for user-facing UI.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/app/features/chat/components/message-info-dialog.tsx`
around lines 232 - 248, The "Cost" label is hardcoded in the metadata stat block
(the object created when metadata?.costEstimateCents != null) and must be
localized; replace the literal 'Cost' with the translation call from the
existing hook (e.g., useT or t) used elsewhere in this component (call
t('messageInfo.cost') or the appropriate key) so the label uses translated text;
update the object for the cost stat to use t('messageInfo.cost') (or the
component's existing translation function) instead of the hardcoded string.
| const SCOPE_OPTIONS = [ | ||
| { value: 'default', label: 'Default' }, | ||
| { value: 'user', label: 'User' }, | ||
| { value: 'team', label: 'Team' }, | ||
| { value: 'role', label: 'Role' }, | ||
| ]; | ||
|
|
||
| const ROLE_OPTIONS = [ | ||
| { value: 'admin', label: 'Admin' }, | ||
| { value: 'developer', label: 'Developer' }, | ||
| { value: 'editor', label: 'Editor' }, | ||
| { value: 'member', label: 'Member' }, | ||
| ]; | ||
|
|
||
| const PERIOD_OPTIONS = [ | ||
| { value: 'monthly', label: 'Monthly' }, | ||
| { value: 'weekly', label: 'Weekly', disabled: true }, | ||
| { value: 'daily', label: 'Daily', disabled: true }, | ||
| ]; |
There was a problem hiding this comment.
This new governance UI is hardcoded English.
The added labels, placeholders, helper copy, button text, and ARIA labels are not routed through useT, so the feature won’t localize and the ARIA text also violates the repo rule.
As per coding guidelines "Do NOT hardcode text, use the translation hooks/functions instead for user-facing UI" and "ALWAYS use translation keys for aria-label values. Never hardcode English in ARIA attributes".
Also applies to: 118-297, 455-572
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/app/features/settings/governance/components/budget-editor.tsx`
around lines 34 - 52, The UI strings in SCOPE_OPTIONS, ROLE_OPTIONS,
PERIOD_OPTIONS and all other user-facing text and aria-labels in
budget-editor.tsx are hardcoded English; replace them with calls to the
translation hook (useT / t) and translation keys (e.g.,
t('governance.budget.scope.default')) instead of raw strings, and ensure
aria-label attributes use translation keys as well (e.g.,
aria-label={t('governance.budget.scope.aria')}); scan the rest of the component
(lines referenced ~118-297, 455-572) and convert placeholders, helper copy,
button text, and ARIA values to use the t() function with meaningful keys and
update/add corresponding entries in the locale files.
| const saveConfig = useCallback( | ||
| async (configToSave: { enabled: boolean; rules: BudgetRule[] }) => { | ||
| try { | ||
| await upsertMutation.mutateAsync({ | ||
| organizationId, | ||
| policyType: 'budgets', | ||
| config: configToSave, | ||
| }); | ||
| toast({ title: 'Budget configuration saved' }); | ||
| } catch (error: unknown) { | ||
| const message = | ||
| error instanceof Error ? error.message : 'Failed to save'; | ||
| toast({ title: message, variant: 'destructive' }); | ||
| } | ||
| }, | ||
| [organizationId, upsertMutation, toast], | ||
| ); | ||
|
|
||
| const handleToggleEnabled = useCallback( | ||
| (checked: boolean) => { | ||
| setEnabled(checked); | ||
| void saveConfig({ enabled: checked, rules }); | ||
| }, | ||
| [saveConfig, rules], | ||
| ); | ||
|
|
||
| const removeRule = useCallback( | ||
| (index: number) => { | ||
| const newRules = rules.filter((_, i) => i !== index); | ||
| setRules(newRules); | ||
| void saveConfig({ enabled, rules: newRules }); | ||
| }, | ||
| [rules, enabled, saveConfig], | ||
| ); | ||
|
|
||
| const openAddDialog = useCallback(() => { | ||
| setEditingIndex(null); | ||
| setDialogRule(emptyRule()); | ||
| setDialogOpen(true); | ||
| }, []); | ||
|
|
||
| const openEditDialog = useCallback( | ||
| (index: number) => { | ||
| setEditingIndex(index); | ||
| setDialogRule(rules[index]); | ||
| setDialogOpen(true); | ||
| }, | ||
| [rules], | ||
| ); | ||
|
|
||
| const handleDialogSave = useCallback( | ||
| (rule: BudgetRule) => { | ||
| let newRules: BudgetRule[]; | ||
| if (editingIndex === null) { | ||
| newRules = [...rules, rule]; | ||
| } else { | ||
| newRules = rules.map((r, i) => (i === editingIndex ? rule : r)); | ||
| } | ||
| setRules(newRules); | ||
| void saveConfig({ enabled, rules: newRules }); | ||
| }, | ||
| [editingIndex, rules, enabled, saveConfig], |
There was a problem hiding this comment.
Failed autosaves leave the editor out of sync with the server.
Toggle/add/edit/delete all update local enabled/rules state before the mutation resolves, and saveConfig only shows a toast on failure. If the mutation is rejected, the screen keeps showing an unsaved budget policy until some later refetch overwrites it.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/app/features/settings/governance/components/budget-editor.tsx`
around lines 357 - 418, The editor mutates local enabled/rules state immediately
(in handleToggleEnabled, removeRule, handleDialogSave,
openAddDialog/openEditDialog flows) but doesn’t revert on upsertMutation
failure, leaving UI out of sync; change these handlers to either (A) perform the
upsertMutation.mutateAsync first and only call setEnabled/setRules after it
resolves successfully, or (B) perform an optimistic update but capture previous
state, call upsertMutation.mutateAsync, and on catch revert to the captured
previous state and show the toast; reference
saveConfig/upsertMutation.mutateAsync and the local setters setEnabled/setRules
and handlers handleToggleEnabled, removeRule, handleDialogSave (and
openAddDialog/openEditDialog flows) so you update or revert state consistently
when the mutation fails.
| <div className="overflow-x-auto"> | ||
| <table className="w-full text-sm"> | ||
| <thead> | ||
| <tr className="border-border border-b"> | ||
| <th className="text-muted-foreground px-3 py-2 text-left font-medium"> | ||
| Scope | ||
| </th> | ||
| <th className="text-muted-foreground px-3 py-2 text-left font-medium"> | ||
| Target | ||
| </th> | ||
| <th className="text-muted-foreground px-3 py-2 text-left font-medium"> | ||
| Period | ||
| </th> | ||
| <th className="text-muted-foreground px-3 py-2 text-right font-medium"> | ||
| Max Tokens | ||
| </th> | ||
| <th className="text-muted-foreground px-3 py-2 text-right font-medium"> | ||
| Max Cost | ||
| </th> | ||
| <th className="text-muted-foreground px-3 py-2 text-right font-medium"> | ||
| Max Requests | ||
| </th> | ||
| <th className="text-muted-foreground px-3 py-2 text-right font-medium"> | ||
| Actions | ||
| </th> | ||
| </tr> | ||
| </thead> |
There was a problem hiding this comment.
The rules table is missing required accessibility semantics.
The raw table markup still needs a caption, and the headers should explicitly preserve column semantics on the generated <th> elements.
As per coding guidelines "ENSURE data tables have a <caption> (can be visually hidden with sr-only)" and "USE scope=\"col\" on <th> elements in tables".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@services/platform/app/features/settings/governance/components/budget-editor.tsx`
around lines 469 - 495, The table in the BudgetEditor component is missing a
<caption> and column semantics on headers; update the table markup in
budget-editor.tsx (the table inside the overflow-x-auto div) to add a caption
(use an sr-only class if it should be visually hidden) and add scope="col"
attributes to each <th> (e.g., the Scope, Target, Period, Max Tokens, Max Cost,
Max Requests, Actions headers) so screen readers get correct column semantics.
| // Track usage for client tool mode (this path bypasses onAgentComplete) | ||
| const usage = await result.usage; | ||
| if (usage && args.organizationId) { | ||
| const { estimateCostCents } = | ||
| await import('../governance/cost_estimation'); | ||
| const inputTokens = usage.inputTokens ?? 0; | ||
| const outputTokens = usage.outputTokens ?? 0; | ||
| if (inputTokens > 0 || outputTokens > 0) { | ||
| await ctx | ||
| .runMutation( | ||
| internal.governance.internal_mutations.incrementUsageLedger, | ||
| { | ||
| organizationId: args.organizationId, | ||
| userId: args.userId ?? 'system', | ||
| inputTokens, | ||
| outputTokens, | ||
| costEstimateCents: estimateCostCents( | ||
| modelId, | ||
| inputTokens, | ||
| outputTokens, | ||
| ), | ||
| timestamp: Date.now(), |
There was a problem hiding this comment.
Use the resolved model and provider pricing for this ledger entry.
This path estimates cost from modelId, but resolveLanguageModelWithFallback() can pick a different model/provider and already exposes provider-specific pricing on resolved.modelData. That makes client-tool chats drift from the main cost ledger whenever default resolution, failover, or provider overrides are involved.
💡 Suggested fix
// Track usage for client tool mode (this path bypasses onAgentComplete)
const usage = await result.usage;
if (usage && args.organizationId) {
const { estimateCostCents } =
await import('../governance/cost_estimation');
+ const providerCost =
+ resolved.modelData.inputCentsPerMillion != null
+ ? {
+ inputCentsPerMillion:
+ resolved.modelData.inputCentsPerMillion,
+ outputCentsPerMillion:
+ resolved.modelData.outputCentsPerMillion ?? 0,
+ }
+ : undefined;
const inputTokens = usage.inputTokens ?? 0;
const outputTokens = usage.outputTokens ?? 0;
if (inputTokens > 0 || outputTokens > 0) {
await ctx
.runMutation(
@@
organizationId: args.organizationId,
userId: args.userId ?? 'system',
inputTokens,
outputTokens,
costEstimateCents: estimateCostCents(
- modelId,
+ resolved.modelData.modelId,
inputTokens,
outputTokens,
+ providerCost,
),
timestamp: Date.now(),
},
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Track usage for client tool mode (this path bypasses onAgentComplete) | |
| const usage = await result.usage; | |
| if (usage && args.organizationId) { | |
| const { estimateCostCents } = | |
| await import('../governance/cost_estimation'); | |
| const inputTokens = usage.inputTokens ?? 0; | |
| const outputTokens = usage.outputTokens ?? 0; | |
| if (inputTokens > 0 || outputTokens > 0) { | |
| await ctx | |
| .runMutation( | |
| internal.governance.internal_mutations.incrementUsageLedger, | |
| { | |
| organizationId: args.organizationId, | |
| userId: args.userId ?? 'system', | |
| inputTokens, | |
| outputTokens, | |
| costEstimateCents: estimateCostCents( | |
| modelId, | |
| inputTokens, | |
| outputTokens, | |
| ), | |
| timestamp: Date.now(), | |
| // Track usage for client tool mode (this path bypasses onAgentComplete) | |
| const usage = await result.usage; | |
| if (usage && args.organizationId) { | |
| const { estimateCostCents } = | |
| await import('../governance/cost_estimation'); | |
| const providerCost = | |
| resolved.modelData.inputCentsPerMillion != null | |
| ? { | |
| inputCentsPerMillion: | |
| resolved.modelData.inputCentsPerMillion, | |
| outputCentsPerMillion: | |
| resolved.modelData.outputCentsPerMillion ?? 0, | |
| } | |
| : undefined; | |
| const inputTokens = usage.inputTokens ?? 0; | |
| const outputTokens = usage.outputTokens ?? 0; | |
| if (inputTokens > 0 || outputTokens > 0) { | |
| await ctx | |
| .runMutation( | |
| internal.governance.internal_mutations.incrementUsageLedger, | |
| { | |
| organizationId: args.organizationId, | |
| userId: args.userId ?? 'system', | |
| inputTokens, | |
| outputTokens, | |
| costEstimateCents: estimateCostCents( | |
| resolved.modelData.modelId, | |
| inputTokens, | |
| outputTokens, | |
| providerCost, | |
| ), | |
| timestamp: Date.now(), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/convex/openai_compat/internal_actions.ts` around lines 384
- 405, The ledger entry uses the original modelId to estimate cost, but
resolveLanguageModelWithFallback() may return a different resolved.modelData
with provider-specific pricing; change the usage-cost calculation to use the
resolved model info (resolved.modelData or resolved.modelId/pricing) when
calling estimateCostCents so the costEstimateCents reflects the actual
provider/model chosen, and ensure the mutation call to
internal.governance.internal_mutations.incrementUsageLedger uses those resolved
values (reference resolveLanguageModelWithFallback(), resolved.modelData,
estimateCostCents, incrementUsageLedger, and result.usage to locate the code).
| cost: z | ||
| .object({ | ||
| inputCentsPerMillion: z.number(), | ||
| outputCentsPerMillion: z.number(), | ||
| }) | ||
| .optional(), |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find and read the schema file
find . -type f -name "providers.ts" | grep -E "(schemas|shared)" | head -20Repository: tale-project/tale
Length of output: 112
🏁 Script executed:
# Also search more broadly
git ls-files | grep -E "providers\.ts" | head -20Repository: tale-project/tale
Length of output: 230
🏁 Script executed:
# Check for the shared schemas directory structure
git ls-files | grep -E "lib/shared/schemas" | head -20Repository: tale-project/tale
Length of output: 941
🏁 Script executed:
# Read the providers.ts schema file
cat -n services/platform/lib/shared/schemas/providers.ts | head -50Repository: tale-project/tale
Length of output: 2118
🏁 Script executed:
# Check if there's package.json to verify Zod version
find services/platform -name "package.json" -type f | xargs grep -l "zod" | head -5Repository: tale-project/tale
Length of output: 91
🏁 Script executed:
# Check Zod version in package.json
cat services/platform/package.json | jq '.dependencies.zod, .devDependencies.zod'Repository: tale-project/tale
Length of output: 73
🏁 Script executed:
# Search for downstream usage of the cost fields from this schema
rg "inputCentsPerMillion|outputCentsPerMillion" services/platform/ -A 3 -B 3Repository: tale-project/tale
Length of output: 15290
🏁 Script executed:
# Also check for any existing validation on pricing fields
rg "cost|pricing" services/platform/lib/shared/schemas/providers.ts -A 5 -B 5Repository: tale-project/tale
Length of output: 432
🏁 Script executed:
# Verify Zod v4 supports .nonnegative() method and check its documentation
python3 << 'EOF'
# Zod v4 is a recent version with TypeScript support
# Let's verify the API is correct
import json
zod_version = "4.3.6"
# .nonnegative() is a standard Zod method that validates numbers >= 0
# It's equivalent to .min(0)
print(f"Zod version: {zod_version}")
print("Zod v4 supports .nonnegative() for numeric validation")
print("This is a standard method in Zod >= 3.0")
EOFRepository: tale-project/tale
Length of output: 173
🏁 Script executed:
# Check cost_estimation.ts to see the actual calculation and impact
cat -n services/platform/convex/governance/cost_estimation.ts | sed -n '1,50p'Repository: tale-project/tale
Length of output: 1759
🏁 Script executed:
# Look at the actual calculation that uses these pricing values
cat -n services/platform/convex/governance/cost_estimation.ts | sed -n '85,110p'Repository: tale-project/tale
Length of output: 43
🏁 Script executed:
# Get the calculation function that uses the pricing values
cat -n services/platform/convex/governance/cost_estimation.ts | grep -A 20 "calculateTokenCost"Repository: tale-project/tale
Length of output: 43
🏁 Script executed:
# Also check how negative values would flow through the system
cat -n services/platform/convex/governance/cost_estimation.ts | tail -50Repository: tale-project/tale
Length of output: 1814
🏁 Script executed:
# Check if there are any tests that validate the schema behavior
find services/platform -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "providers\|modelDefinitionSchema" | head -5Repository: tale-project/tale
Length of output: 43
🏁 Script executed:
# Look at the full cost_estimation.ts to understand all usages
wc -l services/platform/convex/governance/cost_estimation.tsRepository: tale-project/tale
Length of output: 118
🏁 Script executed:
# Verify the UI layer also has no validation preventing negative input
cat -n services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx | grep -B 5 -A 5 "inputCostPerMillion\|inputCentsPerMillion" | head -40Repository: tale-project/tale
Length of output: 156
🏁 Script executed:
# Find the correct UI file path
find services/platform -type f -name "*providerName*" | head -10Repository: tale-project/tale
Length of output: 140
🏁 Script executed:
# Search for the form validation on pricing fields in the UI
rg "inputCostPerMillion|inputCentsPerMillion" services/platform/app/ -B 10 -A 5 | head -80Repository: tale-project/tale
Length of output: 8752
🏁 Script executed:
# Check if there's any input validation on the form fields
cat -n services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx | sed -n '1,100p' | grep -A 20 "input\|Input\|type="Repository: tale-project/tale
Length of output: 156
🏁 Script executed:
# Look at the form input for cost fields specifically
rg "inputCostPerMillion|outputCostPerMillion" services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx -B 3 -A 10 | tail -60Repository: tale-project/tale
Length of output: 164
🏁 Script executed:
# Read the UI file directly using proper path
cat -n 'services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx' | sed -n '1,50p'Repository: tale-project/tale
Length of output: 2371
🏁 Script executed:
# Search for input elements related to cost in the file
rg "inputCostPerMillion" --max-count=20 'services/platform/app/routes/dashboard/$id/settings/providers/$providerName.tsx' -B 5 -A 10Repository: tale-project/tale
Length of output: 2668
🏁 Script executed:
# Verify if there are any other handlers or API endpoints that accept this schema
rg "modelDefinitionSchema|providerJsonSchema" services/platform/ -B 2 -A 2 | head -50Repository: tale-project/tale
Length of output: 3649
🏁 Script executed:
# Final confirmation: check if any tests exist that expect the current behavior
find services/platform -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "inputCentsPerMillion\|cost.*negative" 2>/dev/null | head -5Repository: tale-project/tale
Length of output: 43
Reject negative model pricing at the schema boundary.
The inputCentsPerMillion and outputCentsPerMillion fields accept negative numbers, which propagate through cost estimation calculations and would effectively credit usage against budgets instead of charging. These fields should be constrained to non-negative values.
Proposed fix
cost: z
.object({
- inputCentsPerMillion: z.number(),
- outputCentsPerMillion: z.number(),
+ inputCentsPerMillion: z.number().nonnegative(),
+ outputCentsPerMillion: z.number().nonnegative(),
})
.optional(),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cost: z | |
| .object({ | |
| inputCentsPerMillion: z.number(), | |
| outputCentsPerMillion: z.number(), | |
| }) | |
| .optional(), | |
| cost: z | |
| .object({ | |
| inputCentsPerMillion: z.number().nonnegative(), | |
| outputCentsPerMillion: z.number().nonnegative(), | |
| }) | |
| .optional(), |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/platform/lib/shared/schemas/providers.ts` around lines 14 - 19, The
cost schema currently allows negative values for inputCentsPerMillion and
outputCentsPerMillion; update the cost object schema (the cost definition) so
both inputCentsPerMillion and outputCentsPerMillion are constrained to
non-negative numbers (e.g., use z.number().nonnegative() or z.number().min(0))
to reject negative pricing at the schema boundary before it reaches cost
estimation.
| class UsageInfo(BaseModel): | ||
| """Token usage metadata for AI API calls.""" | ||
|
|
||
| input_tokens: int = Field(default=0, description="Input/prompt tokens consumed") | ||
| output_tokens: int = Field(default=0, description="Output/completion tokens consumed") | ||
| total_tokens: int = Field(default=0, description="Total tokens consumed") | ||
| model: str | None = Field(default=None, description="Model used for this operation") | ||
|
|
There was a problem hiding this comment.
Split generation usage by operation/model instead of flattening it.
UsageInfo can only describe one model, but GenerateResponse now carries combined embedding + LLM tokens. services/rag/app/services/rag_service.py sums embedding tokens into input_tokens and then stamps the whole payload with the chat model, so any downstream pricing keyed by model will bill the embedding portion at the wrong rate. Return separate usage blocks per operation/model here instead of a single flattened record.
Also applies to: 261-261
| """Search the knowledge base using hybrid BM25 + vector search. | ||
|
|
||
| Embedding token usage available via `self.last_search_usage` after call. | ||
| """ |
There was a problem hiding this comment.
Don’t cache request usage on the shared rag_service singleton.
last_search_usage is request-scoped, but rag_service is module-level shared state. Another request can overwrite this value between the await here and the caller reading or combining it, which can return the wrong usage in /search and misbill /generate. Thread the usage back in the return value instead of storing it on self.
Also applies to: 301-302
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/rag/app/services/rag_service.py` around lines 281 - 284, The service
currently stores per-request embedding usage on the singleton via
self.last_search_usage (and similar at lines ~301-302); change the search (and
affected generate) methods to stop writing to self.last_search_usage and instead
return the usage as part of the method return value (e.g., include a usage field
in the search result tuple/dict) so callers receive request-scoped usage
directly; update callers of search (and the generate-related method) to read the
returned usage rather than reading self.last_search_usage.
| self.last_search_usage = EmbeddingUsage(model=self._embedding._model) | ||
| try: | ||
| embedding_task = asyncio.create_task(self._embedding.embed_query(query)) | ||
| embedding_task = asyncio.create_task(self._embedding.embed_query_with_usage(query)) | ||
| fts_task = asyncio.create_task(self._fts_search(query, file_ids, top_k * 3)) | ||
|
|
||
| query_embedding, fts_results = await asyncio.gather(embedding_task, fts_task) | ||
| query_result, fts_results = await asyncio.gather(embedding_task, fts_task) | ||
| query_embedding = query_result.embedding | ||
| self.last_search_usage = query_result.usage |
There was a problem hiding this comment.
Use the usage-aware embedding path in the BM25 fallback.
If asyncio.gather(...) fails because _fts_search() raises, the handler later recomputes the embedding via embed_query() at Line 153. last_search_usage then stays at the zero-token placeholder initialized here, so vector-only fallback searches undercount spend even though an embedding request was made.
Suggested fix
- self.last_search_usage = EmbeddingUsage(model=self._embedding._model)
+ self.last_search_usage = None
try:
embedding_task = asyncio.create_task(self._embedding.embed_query_with_usage(query))
fts_task = asyncio.create_task(self._fts_search(query, file_ids, top_k * 3))
@@
- if query_embedding is None:
- query_embedding = await self._embedding.embed_query(query)
+ if query_embedding is None:
+ query_result = await self._embedding.embed_query_with_usage(query)
+ query_embedding = query_result.embedding
+ self.last_search_usage = query_result.usage🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@services/rag/app/services/search_service.py` around lines 61 - 68, The code
currently races _fts_search and embed_query_with_usage with asyncio.gather so if
_fts_search raises the embedding task's usage result can be lost and
last_search_usage remains the zero-token placeholder; change the flow to always
await or retrieve the embedding result (from _embedding.embed_query_with_usage)
independently of _fts_search failures and assign self.last_search_usage =
query_result.usage before handling BM25/vector-only fallback (the code path that
calls embed_query()); specifically ensure the embedding_task started in this
block is awaited (or its result retrieved) even if the fts_task raises, so
last_search_usage reflects the actual usage for later fallback logic that relies
on self.last_search_usage.
The embedding service now reads response.usage for token tracking. Add the missing usage field to the test mock SimpleNamespace.
Summary
costEstimateCentsin message metadata, with per-model pricing sourced from provider config files (fallback to hardcoded map)Test plan
Summary by CodeRabbit
New Features
Enhancements