Summary#71
Merged
Merged
Conversation
…on_id, paragraph_id etc. Also removed content_text from columns to embed due to token issue (sticking to summary for now)
…ed entries for provided base log event ids only. Used this to add an embed_along mode to the Intranet RAG agent which embeds at the end of each batch of documents being ingested (rather than running once at the end of the global ingestion flow) for better scoping of potential erros from the embedding call. Also updated the Intranet table to be Content from content
- Token-aware summarization with iterative compression to meet embedding limits; uses a fast 0.25 tokens/byte estimator and re-requests with concise directives until within budget (post-generation media context added only if it fits) - Rich media extraction from PDFs (tables/images) with Docling; auto image descriptions via SmolVLM-Instruct and inclusion in searchable summaries for better recall - Efficient sentence splitting via a lightweight spaCy sentencizer (plus enum-prefix fix); clean fallbacks to LangChain/regex and clear warnings when optional deps are absent - Hierarchy-first chunking leveraging Docling headings/refs; stable parent-child mapping so images/tables reliably attach to the correct sections - Unified short 5-char IDs and strictly hierarchical content_ids/titles (doc>sec>para>sent/img/tbl); removed incremental IDs and any slicing in IDs - Batch parsing support in the parser; FileManager leverages batch mode for multi-file runs (sync and streaming async) with ordered results - Pre-parse .doc/.docx→.pdf conversion using OS-appropriate backends (LibreOffice/win32com/docx2pdf), with timeouts, serialized execution where needed, and optional cleanup of temporary PDFs
…ly for ask method
…valuation script to be consistent with the chosen test set format with multi tiered difficulty levels. Fixed race conditions when running the Intranet API on multiple parallel workers. Updated the RAGHTTPClient (used for evals) to reuse the http session for connection pooling to run parallel evals
…ch more description for the rag agent
…only a single table in the loaded schema. Disabled Contacts inclusion in the KM for the Intranet deployment. Adjusted the prompts accordingly to conditionally include join related instructions only when needed
- Introduced a parallel-safe usage logging context for intranet usage, created idempotently and recording query, answer, sources, confidence, response_time, success, and error for all RAG calls. - Standardized a success/error contract across RAG responses so successful calls set success=true and error=null, and failures set answer=null with a descriptive error; all downstream processing now checks success before formatting or post‑processing. Extended the RAG API service and the evaluation pipeline to support this. - Added a direct LLM retrieval path alongside the tool‑loop in the RAG agent with in-context document dumping. - Improved the evaluation pipeline with concurrent streaming queries, rolling saves, validation only for successful non‑empty answers, minimal records for failures, and wall‑clock time tracking for accurate performance metrics. - Clarified semantic‑validator guidance to not penalize accurate, non‑contradictory extra information and added supporting examples.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.