Summary by hmahmood24 · Pull Request #71 · unifyai/unity

hmahmood24 · 2025-09-24T11:06:03Z

No description provided.

…on_id, paragraph_id etc. Also removed content_text from columns to embed due to token issue (sticking to summary for now)

…ed entries for provided base log event ids only. Used this to add an embed_along mode to the Intranet RAG agent which embeds at the end of each batch of documents being ingested (rather than running once at the end of the global ingestion flow) for better scoping of potential erros from the embedding call. Also updated the Intranet table to be Content from content

- Token-aware summarization with iterative compression to meet embedding limits; uses a fast 0.25 tokens/byte estimator and re-requests with concise directives until within budget (post-generation media context added only if it fits) - Rich media extraction from PDFs (tables/images) with Docling; auto image descriptions via SmolVLM-Instruct and inclusion in searchable summaries for better recall - Efficient sentence splitting via a lightweight spaCy sentencizer (plus enum-prefix fix); clean fallbacks to LangChain/regex and clear warnings when optional deps are absent - Hierarchy-first chunking leveraging Docling headings/refs; stable parent-child mapping so images/tables reliably attach to the correct sections - Unified short 5-char IDs and strictly hierarchical content_ids/titles (doc>sec>para>sent/img/tbl); removed incremental IDs and any slicing in IDs - Batch parsing support in the parser; FileManager leverages batch mode for multi-file runs (sync and streaming async) with ordered results - Pre-parse .doc/.docx→.pdf conversion using OS-appropriate backends (LibreOffice/win32com/docx2pdf), with timeouts, serialized execution where needed, and optional cleanup of temporary PDFs

…ly for ask method

…valuation script to be consistent with the chosen test set format with multi tiered difficulty levels. Fixed race conditions when running the Intranet API on multiple parallel workers. Updated the RAGHTTPClient (used for evals) to reuse the http session for connection pooling to run parallel evals

…ch more description for the rag agent

…only a single table in the loaded schema. Disabled Contacts inclusion in the KM for the Intranet deployment. Adjusted the prompts accordingly to conditionally include join related instructions only when needed

- Introduced a parallel-safe usage logging context for intranet usage, created idempotently and recording query, answer, sources, confidence, response_time, success, and error for all RAG calls. - Standardized a success/error contract across RAG responses so successful calls set success=true and error=null, and failures set answer=null with a descriptive error; all downstream processing now checks success before formatting or post‑processing. Extended the RAG API service and the evaluation pipeline to support this. - Added a direct LLM retrieval path alongside the tool‑loop in the RAG agent with in-context document dumping. - Improved the evaluation pipeline with concurrent streaming queries, rolling saves, validation only for successful non‑empty answers, minimal records for failures, and wall‑clock time tracking for accurate performance metrics. - Clarified semantic‑validator guidance to not penalize accurate, non‑contradictory extra information and added supporting examples.

hmahmood24 had a problem deploying to unity-testing September 24, 2025 11:14 — with GitHub Actions Failure

hmahmood24 had a problem deploying to unity-testing September 24, 2025 11:14 — with GitHub Actions Error

hmahmood24 had a problem deploying to unity-testing September 24, 2025 11:14 — with GitHub Actions Failure

hmahmood24 temporarily deployed to unity-testing September 24, 2025 11:14 — with GitHub Actions Inactive

hmahmood24 had a problem deploying to unity-testing September 24, 2025 11:14 — with GitHub Actions Failure

hmahmood24 had a problem deploying to unity-testing September 24, 2025 11:20 — with GitHub Actions Failure

hmahmood24 had a problem deploying to unity-testing September 24, 2025 11:20 — with GitHub Actions Error

hmahmood24 had a problem deploying to unity-testing September 24, 2025 11:20 — with GitHub Actions Failure

hmahmood24 temporarily deployed to unity-testing September 24, 2025 11:20 — with GitHub Actions Inactive

hmahmood24 had a problem deploying to unity-testing September 24, 2025 11:20 — with GitHub Actions Failure

hmahmood24 added 2 commits September 24, 2025 16:39

Empty commit to trigger CI

700dcc0

Moved policy documents back to intranet/policies

26bcb9c

hmahmood24 added 11 commits September 24, 2025 16:39

Updated the Intranet schema to remove columns like document_id, secti…

5fa40c9

…on_id, paragraph_id etc. Also removed content_text from columns to embed due to token issue (sticking to summary for now)

Fix KnowledgeManager so that the tool loop policy is enforced correct…

3f37530

…ly for ask method

Made the instructions to pick between semantic search vs lexical sear…

5ba7e82

…ch more description for the rag agent

Disable multi table join tools in the KnowledgeManager when there is …

0336c8e

…only a single table in the loaded schema. Disabled Contacts inclusion in the KM for the Intranet deployment. Adjusted the prompts accordingly to conditionally include join related instructions only when needed

Update tests workflow to download spaCy model before running the tests

e2df073

Ran black formatting

a2496d9

Remove separate docling step from the README [skip ci]

12ef2d4

hmahmood24 force-pushed the summary branch from af847e3 to 12ef2d4 Compare September 24, 2025 11:40

Fix file manager tests

6d93bd9

hmahmood24 had a problem deploying to unity-testing September 24, 2025 12:04 — with GitHub Actions Failure

hmahmood24 temporarily deployed to unity-testing September 24, 2025 12:04 — with GitHub Actions Inactive

hmahmood24 had a problem deploying to unity-testing September 24, 2025 12:04 — with GitHub Actions Failure

hmahmood24 had a problem deploying to unity-testing September 24, 2025 12:04 — with GitHub Actions Error

hmahmood24 had a problem deploying to unity-testing September 24, 2025 12:04 — with GitHub Actions Failure

hmahmood24 temporarily deployed to unity-testing September 24, 2025 12:04 — with GitHub Actions Inactive

hmahmood24 had a problem deploying to unity-testing September 24, 2025 12:04 — with GitHub Actions Error

hmahmood24 had a problem deploying to unity-testing September 24, 2025 12:04 — with GitHub Actions Failure

hmahmood24 temporarily deployed to unity-testing September 24, 2025 12:04 — with GitHub Actions Inactive

hmahmood24 merged commit a8e7b1a into main Sep 24, 2025
5 of 16 checks passed

hmahmood24 deleted the summary branch December 2, 2025 19:24

djl11 mentioned this pull request May 26, 2026

Release: open-source-readiness pass + CVE clear + captcha primitive #283

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Summary#71

Summary#71
hmahmood24 merged 14 commits into
mainfrom
summary

hmahmood24 commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hmahmood24 commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant