Add LLM Token and Cost Estimation Pre-check Before Indexing #1917

khaledalam · 2025-05-06T18:28:24Z

Description

This PR introduces a new feature that estimates the token usage and LLM cost before executing the indexing pipeline. This helps users anticipate costs and make informed decisions before proceeding.

Related Issues

Proposed Changes

New CLI params:

--estimate-cost : [True || False] (by default False)
--average-output-tokens-per-chunk : (by default 500}

graphrag index \
   --root ./ragtest \
   --estimate-cost \
   --average-output-tokens-per-chunk 500

Util file: estimate_cost.py added for token counting and pricing logic.
Pricing dynamically fetched from a public JSON source.
CLI flag --estimate-cost added for optional pre-check.
Console summary with embedding/chat model breakdown and token counts.
User confirmation prompt before continuing indexing.
= Conservative cost estimation based on --average-output-tokens-per-chunk value.

Checklist

I have tested these changes locally.
I have reviewed the code changes.
I have updated the documentation (if necessary).
I have added appropriate unit tests (if applicable).

…and user confirmation prompt - Implemented accurate token counting per chunk using TokenTextSplitter - Integrated OpenAI pricing parsing with fallback model logic - Added CLI option to estimate cost before indexing (`--estimate-cost`) - Prompt user to confirm whether to proceed with indexing after estimation - Improved logging and token summary formatting - Included conservative upper-bound estimation notice On branch feat/llm-cost-estimation Changes to be committed: modified: graphrag/cli/index.py modified: graphrag/cli/main.py new file: graphrag/index/utils/estimate_cost.py

khaledalam · 2025-05-06T18:58:43Z

@microsoft-github-policy-service agree

khaledalam · 2025-06-05T14:46:59Z

Please review @natoverse

khaledalam requested review from a team as code owners May 6, 2025 18:28

khaledalam added 2 commits May 20, 2025 01:28

Merge branch 'main' into feat/llm-cost-estimation

2efc6ac

Merge branch 'main' into feat/llm-cost-estimation

d8cde7c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LLM Token and Cost Estimation Pre-check Before Indexing #1917

Add LLM Token and Cost Estimation Pre-check Before Indexing #1917

Uh oh!

khaledalam commented May 6, 2025 •

edited

Loading

Uh oh!

khaledalam commented May 6, 2025

Uh oh!

khaledalam commented Jun 5, 2025

Uh oh!

Uh oh!

Add LLM Token and Cost Estimation Pre-check Before Indexing #1917

Are you sure you want to change the base?

Add LLM Token and Cost Estimation Pre-check Before Indexing #1917

Uh oh!

Conversation

khaledalam commented May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

This PR introduces a new feature that estimates the token usage and LLM cost before executing the indexing pipeline. This helps users anticipate costs and make informed decisions before proceeding.

Related Issues

Proposed Changes

New CLI params:

Checklist

Uh oh!

khaledalam commented May 6, 2025

Uh oh!

khaledalam commented Jun 5, 2025

Uh oh!

Uh oh!

khaledalam commented May 6, 2025 •

edited

Loading