Skip to content

Added raw_chunks parameter to search methods #1886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

akarim23131
Copy link

Description

Added a new --raw-chunks flag to expose the raw context data retrieved from the vector store before it's processed by the LLM. This feature enhances debugging and transparency by allowing users to see the exact information being used to generate responses across all search methods (local, global, and drift). Most importantly, users now have the flexibility to control when they want to see the raw context - by simply adding the --raw-chunks flag to their query command, they can view the actual chunks of information being passed to the LLM.

Query Command

graphrag query --method local --query "" --root graph_index --raw-chunks
graphrag query --method global --query "" --root graph_index --raw-chunks
graphrag query --method drift --query "" --root graph_index --raw-chunks

Related Issues

  • Improves debugging capabilities for RAG applications
  • Enhances transparency in search results
  • Helps users understand and verify the context selection process
  • Provides user control over raw context visibility through CLI flag

Proposed Changes

  1. Added user-controlled raw context display:

    • New --raw-chunks CLI flag for optional context viewing
    • Users can toggle between normal and detailed raw context output
    • Works seamlessly with all search methods (local, global, drift)
  2. Added RawChunksCallback class in query.py to handle the display of raw chunks with structured formatting for:

    • Reports with titles and text
    • Text units with source information
    • Relationships and community data
    • Special handling for DRIFT search's three-step process
  3. Modified search implementation files:

    • factory.py: Added raw_chunks parameter to search engine factory methods
    • main.py: Implemented --raw-chunks CLI flag
    • query.py: Added raw chunks handling for all search types
    • search.py: Updated search implementations (for local, global, drift ) to support raw chunks display
  4. Enhanced DRIFT search to show context at each step:

    • Primer search results
    • Follow-up question contexts
    • Final synthesized context

Checklist

  • I have tested these changes locally.
  • I have reviewed the code changes.
  • I have updated the documentation (if necessary).
  • I have added appropriate unit tests (if applicable).

Additional Notes

  • The feature is opt-in via the --raw-chunks flag, maintaining backward compatibility
  • Users have complete control over when to view raw context through simple CLI flag
  • Raw chunks are displayed in a structured format for better readability
  • Implementation handles different data types (dictionaries, lists, strings) robustly
  • Special consideration given to DRIFT search's multi-step process

@akarim23131 akarim23131 requested review from a team as code owners April 21, 2025 12:27
@akarim23131
Copy link
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant