Skip to content

Add LLM Token and Cost Estimation Pre-check Before Indexing #1917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

khaledalam
Copy link

@khaledalam khaledalam commented May 6, 2025

Description

This PR introduces a new feature that estimates the token usage and LLM cost before executing the indexing pipeline. This helps users anticipate costs and make informed decisions before proceeding.

Related Issues

#385 , #487

graphrag

Proposed Changes

New CLI params:

  • --estimate-cost : [True || False] (by default False)
  • --average-output-tokens-per-chunk : (by default 500}
graphrag index \
   --root ./ragtest \
   --estimate-cost \
   --average-output-tokens-per-chunk 500
  • Util file: estimate_cost.py added for token counting and pricing logic.
  • Pricing dynamically fetched from a public JSON source.
  • CLI flag --estimate-cost added for optional pre-check.
  • Console summary with embedding/chat model breakdown and token counts.
  • User confirmation prompt before continuing indexing.
    = Conservative cost estimation based on --average-output-tokens-per-chunk value.

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

…and user confirmation prompt

- Implemented accurate token counting per chunk using TokenTextSplitter
- Integrated OpenAI pricing parsing with fallback model logic
- Added CLI option to estimate cost before indexing (`--estimate-cost`)
- Prompt user to confirm whether to proceed with indexing after estimation
- Improved logging and token summary formatting
- Included conservative upper-bound estimation notice

 On branch feat/llm-cost-estimation
 Changes to be committed:
	modified:   graphrag/cli/index.py
	modified:   graphrag/cli/main.py
	new file:   graphrag/index/utils/estimate_cost.py
@khaledalam khaledalam requested review from a team as code owners May 6, 2025 18:28
@khaledalam
Copy link
Author

@microsoft-github-policy-service agree

@khaledalam
Copy link
Author

Please review @natoverse

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant