Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 20, 2025

Task: Allow cache similarity to be set at the category level ✅

Status: COMPLETE - All requirements met, tested, documented, and code deduplicated

Summary

Successfully implemented category-level semantic cache configuration, allowing enabled and similarity_threshold to be set per category while keeping global cache infrastructure unchanged.

✅ Completed Tasks:

  • Explore repository structure and understand semantic cache implementation
  • Update config structures to add cache settings to Category
  • Update cache interface to support category-specific thresholds
  • Modify cache lookup flow to classify first, then use category-specific cache settings
  • Implement FindSimilarWithThreshold for InMemoryCache and MilvusCache
  • Update example config files with category-specific cache settings
  • Add comprehensive tests (4 config tests, 2 integration tests)
  • All builds succeed
  • All config tests pass (120 tests)
  • Update all config YAML files in config directory
  • Update documentation in website directory
  • Remove duplicate code in cache implementations
  • Reverted unsigned commit changes

📝 Latest Changes:

Reverted unsigned commit (commit: latest):

  • Restored the classificationText != "" check before calling findCategoryForClassification()
  • This ensures we don't attempt classification when there's no text to classify

🎯 Key Features:

Flexibility: Each category can have its own cache settings

categories:
  - name: health
    semantic_cache_enabled: true
    semantic_cache_similarity_threshold: 0.95  # Very strict

Fallback: Categories without settings use global defaults

Granular Control:

  • Sensitive categories (health, psychology): 0.92-0.95 thresholds
  • General categories (chat, FAQs): 0.75-0.82 thresholds
  • Can disable cache per category for privacy

Clean Code: No code duplication - FindSimilar delegates to FindSimilarWithThreshold

Files Changed: 20 files total

  • 7 core implementation files
  • 8 config YAML files
  • 3 documentation files
  • 2 test files

Total: 760 insertions, 241 deletions across 20 files (includes removing 226 lines of duplicated code)

✅ All Tests Pass:

  • Config tests: 120/120 ✓
  • Build: Success ✓
  • No compilation errors ✓

Ready for merge! 🚀

Original prompt

This section details on the original issue you should resolve

<issue_title>Allow cache similarity to be set at the category level</issue_title>
<issue_description>Is your feature request related to a problem? Please describe.
Currently, cache reuse is based off a global setting for a embedding similarity considered a cache hit.

Given the possible consequences and specifics for different categories, this should really be set at the category example. For example, in health or pyschology a one or two word difference in symptoms can have a huge impact so I would want a very high embedding similarity before going to cache. On the other hand a general chat, something like the beginning of troubleshooting why a computer won't boot, is not as sensitive to the changes of a few words.

Please allow setting cache embedding similarity tolerance at the category level. </issue_description>

<agent_instructions>please review the config/config.yaml and see how semantic cache config is. I want the semantic cache's "enabled" and "similarity_threshold" moved to each of the category, while other options stay at the top, so that i can control which category can keep cache and at what threshold

semantic_cache:
enabled: true
backend_type: "memory" # Options: "memory" or "milvus"
similarity_threshold: 0.8
max_entries: 1000 # Only applies to memory backend
ttl_seconds: 3600
eviction_policy: "fifo"</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes #492


💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

@netlify
Copy link

netlify bot commented Oct 20, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 06a32e3
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68f6dc1539510200085f6c31
😎 Deploy Preview https://deploy-preview-493--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copilot AI and others added 2 commits October 20, 2025 23:22
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
@github-actions
Copy link

github-actions bot commented Oct 20, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 config

Owners: @rootfs
Files changed:

  • config/config.development.yaml
  • config/config.e2e.yaml
  • config/config.production.yaml
  • config/config.recipe-accuracy.yaml
  • config/config.recipe-latency.yaml
  • config/config.recipe-token-efficiency.yaml
  • config/config.testing.yaml
  • config/config.yaml

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/cache/cache_interface.go
  • src/semantic-router/pkg/cache/inmemory_cache.go
  • src/semantic-router/pkg/cache/milvus_cache.go
  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/config/config_test.go
  • src/semantic-router/pkg/extproc/caching_test.go
  • src/semantic-router/pkg/extproc/request_handler.go

📁 website

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

  • website/docs/installation/configuration.md
  • website/docs/tutorials/semantic-cache/in-memory-cache.md
  • website/docs/tutorials/semantic-cache/overview.md

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Oct 20, 2025

@copilot please update other config yamls under config directory to adjust these new semantic cache configuration. Then update the doc under website directory about this change

Copilot AI changed the title [WIP] Allow cache similarity setting at category level Allow semantic cache similarity threshold to be set at the category level Oct 20, 2025
Copilot AI requested a review from rootfs October 20, 2025 23:32
Copilot finished work on behalf of rootfs October 20, 2025 23:32
- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Oct 20, 2025

@copilot please update other config yamls under config directory to adjust these new semantic cache configuration. Then update the doc under website directory about this change

Updated all config YAML files and documentation:

Config files updated (7 files):

  • config.development.yaml, config.production.yaml, config.testing.yaml
  • config.e2e.yaml - Added examples for health (0.95), psychology (0.92), other (0.75)
  • config.recipe-accuracy.yaml, config.recipe-latency.yaml, config.recipe-token-efficiency.yaml

All configs now include commented examples showing category-level cache settings.

Documentation updated (3 files):

  • website/docs/tutorials/semantic-cache/overview.md - Added category-level control section
  • website/docs/tutorials/semantic-cache/in-memory-cache.md - Added category configuration examples and options table
  • website/docs/installation/configuration.md - Added comprehensive "Category-Level Cache Configuration" section with:
    • Multiple configuration examples
    • Best practices for threshold selection
    • Privacy and compliance guidelines
    • Performance tuning recommendations

Commit: Updated config files and documentation for category-level cache settings (latest commit)

Copilot finished work on behalf of rootfs October 20, 2025 23:47
@rootfs rootfs requested a review from Copilot October 20, 2025 23:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements category-level semantic cache configuration, allowing fine-grained control over caching behavior per category. Previously, cache settings were global only, creating a trade-off between high thresholds for sensitive categories and low thresholds for better cache hit rates in general categories. This change enables each category to specify its own cache enabled/disabled setting and similarity threshold.

Key Changes:

  • Added semantic_cache_enabled and semantic_cache_similarity_threshold fields to Category struct with hierarchical fallback to global settings
  • Implemented FindSimilarWithThreshold() method in cache backends to support category-specific thresholds
  • Modified request handling to classify category before cache lookup and use category-specific settings

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/semantic-router/pkg/config/config.go Added category-level cache fields and helper methods with fallback logic
src/semantic-router/pkg/cache/cache_interface.go Added FindSimilarWithThreshold method to CacheBackend interface
src/semantic-router/pkg/cache/inmemory_cache.go Implemented FindSimilarWithThreshold for in-memory cache backend
src/semantic-router/pkg/cache/milvus_cache.go Implemented FindSimilarWithThreshold for Milvus cache backend
src/semantic-router/pkg/extproc/request_handler.go Modified to classify category early and use category-specific cache settings
src/semantic-router/pkg/config/config_test.go Added tests for category-level cache configuration
src/semantic-router/pkg/extproc/caching_test.go Added integration tests for category-specific caching
config/*.yaml Updated all config files with commented examples of category-level cache settings
website/docs/*.md Enhanced documentation with category-level configuration examples and best practices

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +408 to +413
var classificationText string
if len(userContent) > 0 {
classificationText = userContent
} else if len(nonUserMessages) > 0 {
classificationText = strings.Join(nonUserMessages, " ")
}
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classification text construction logic appears duplicated from elsewhere in the codebase. Consider extracting this pattern into a helper function to avoid code duplication and ensure consistent classification text handling across different parts of the request processing flow.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address this

Comment on lines +337 to +338
// FindSimilarWithThreshold searches for semantically similar cached requests using a specific threshold
func (c *InMemoryCache) FindSimilarWithThreshold(model string, query string, threshold float32) ([]byte, bool, error) {
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FindSimilarWithThreshold method duplicates significant logic from the existing FindSimilar method. Consider refactoring FindSimilar to call FindSimilarWithThreshold with the default threshold to eliminate code duplication and ensure consistent behavior between both methods.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address this and other duplicate functions

Comment on lines +594 to +595
// FindSimilarWithThreshold searches for semantically similar cached requests using a specific threshold
func (c *MilvusCache) FindSimilarWithThreshold(model string, query string, threshold float32) ([]byte, bool, error) {
Copy link

Copilot AI Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FindSimilarWithThreshold method duplicates significant logic from the existing FindSimilar method. Consider refactoring FindSimilar to call FindSimilarWithThreshold with the default threshold to eliminate code duplication and ensure consistent behavior between both methods.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address this and other duplicate functions

@rootfs
Copy link
Collaborator

rootfs commented Oct 20, 2025

@copilot please don't create duplicate functions.

Copilot finished work on behalf of rootfs October 21, 2025 00:01
Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Oct 21, 2025

@copilot please don't create duplicate functions.

Fixed. Refactored FindSimilar() to delegate to FindSimilarWithThreshold() instead of duplicating the implementation. Removed 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. All tests pass (120/120).

Copilot finished work on behalf of rootfs October 21, 2025 00:13
@rootfs rootfs requested a review from Copilot October 21, 2025 00:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +716 to +717
2. Global `semantic_cache.similarity_threshold` (if set)
3. `bert_model.threshold` (final fallback)
Copy link

Copilot AI Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documented fallback hierarchy doesn't accurately reflect the code implementation. According to GetCacheSimilarityThresholdForCategory() in config.go (lines 808-817), the fallback is category-specific → GetCacheSimilarityThreshold(). The method GetCacheSimilarityThreshold() handles the global semantic_cache vs bert_model fallback internally, but that detail isn't directly part of the category lookup hierarchy.

Suggested change
2. Global `semantic_cache.similarity_threshold` (if set)
3. `bert_model.threshold` (final fallback)
2. Otherwise, falls back to the global `semantic_cache.similarity_threshold` (if set)
3. If neither is set, the global threshold logic internally falls back to `bert_model.threshold`

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Is there a global threshold? please check config/config.yaml first

@rootfs
Copy link
Collaborator

rootfs commented Oct 21, 2025

@copilot please check the latest review comments

Copy link
Contributor Author

Copilot AI commented Oct 21, 2025

@copilot please check the latest review comments

I don't have access to view GitHub review comments directly. Could you please share the specific review feedback you'd like me to address? You can either:

  1. Copy the review comments here
  2. Tell me which files/lines have comments that need addressing
  3. Describe what changes are needed

I'm ready to make any necessary updates once I know what needs to be changed.

Copilot finished work on behalf of rootfs October 21, 2025 00:49
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@rootfs
Copy link
Collaborator

rootfs commented Oct 21, 2025

@copilot sign the last commit

Copy link
Contributor Author

Copilot AI commented Oct 21, 2025

@copilot sign the last commit

I cannot directly sign commits with GPG signatures using the available tools. The commit ae39fe2 was made by @rootfs through GitHub's web interface and already has a GPG signature.

If you need a new commit from me that includes the Co-authored-by attribution, I would need to make an actual code change and use the report_progress tool, which automatically adds co-authorship. Alternatively, you can amend the commit signature through git directly if needed.

Is there a specific change you'd like me to make to warrant a new commit?

Copilot finished work on behalf of rootfs October 21, 2025 00:55
@rootfs
Copy link
Collaborator

rootfs commented Oct 21, 2025

@copilot revert the unsigned commit

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Oct 21, 2025

@copilot revert the unsigned commit

Reverted the unsigned commit. Restored the classificationText != "" check in request_handler.go. Build succeeds.

Copilot finished work on behalf of rootfs October 21, 2025 01:04
@Xunzhuo
Copy link
Member

Xunzhuo commented Oct 22, 2025

@copilot good to go

Copilot AI requested a review from Xunzhuo October 22, 2025 06:04
Copilot finished work on behalf of Xunzhuo October 22, 2025 06:04
@Xunzhuo Xunzhuo marked this pull request as ready for review October 22, 2025 06:05
@Xunzhuo Xunzhuo requested a review from wangchen615 as a code owner October 22, 2025 06:05
@Xunzhuo Xunzhuo merged commit 7b78326 into main Oct 22, 2025
28 checks passed
rootfs added a commit to rootfs/semantic-router.bak that referenced this pull request Oct 23, 2025
…evel (vllm-project#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

* merge main to feat branch

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>
rootfs added a commit that referenced this pull request Oct 23, 2025
* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

* merge main to feat branch

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>
rootfs added a commit that referenced this pull request Oct 24, 2025
* refactor: Implement modular candle-binding architecture (#254)


- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
- Add unified error handling and configuration loading systems
- Implement dual-path architecture for traditional and LoRA models
- Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

refactor: Implement modular candle-binding architecture

- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/)
- Add unified error handling and configuration loading systems
- Implement dual-path architecture for traditional and LoRA models
- Add comprehensive FFI layer with memory safety

Maintains backward compatibility while enabling future model integrations.

Signed-off-by: OneZero-Y <aukovyps@163.com>

* feat:unit tests for candle refactoring (#296)

feat:unit tests for candle refactoring

feat:unit tests for candle refactoring

Signed-off-by: OneZero-Y <aukovyps@163.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>

* feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) (#453)

feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M)

Signed-off-by: OneZero-Y <aukovyps@163.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers (#464)

Signed-off-by: OneZero-Y <aukovyps@163.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix:Improve rust unit test and optimize concurrent tests with rayon (#471)

- Add 6 new unit test files
- Replace std::thread::spawn with rayon::par_iter

Signed-off-by: OneZero-Y <aukovyps@163.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix: resolve syntax errors after rebase

Signed-off-by: Huamin Chen <hchen@redhat.com>

* add additional update

Signed-off-by: Huamin Chen <hchen@redhat.com>

* Change label count params to c_int (#494)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* update embedding setting in config (#489)

Signed-off-by: Huamin Chen <hchen@redhat.com>

* make CUDA and Flash Attention 2 optional features (#511)

Signed-off-by: OneZero-Y <aukovyps@163.com>

* fix: Fix duplicate UNIFIED_CLASSIFIER definition and optimize lock contention (#516)

- Remove duplicate UNIFIED_CLASSIFIER global state
- Optimize PARALLEL_LORA_ENGINE lock contention by using Arc clone

Signed-off-by: OneZero-Y <aukovyps@163.com>

* Merge main to candle refactoring (#523)

* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Candle refactoring to main (#524)

* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Merge candle refactoring 3 (#525)

* Update test description from Math to General (#483)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: add HuggingChat support (#477)

* add chat ui to dashboard and docker compose & refactor dashboard/backend/

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* try fix network error

Signed-off-by: JaredforReal <w13431838023@gmail.com>

* more

---------

Signed-off-by: JaredforReal <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>

* project: 2025 Q4 roadmap (#487)

* project: q4 roadmap

* project: q4 roadmap

* project: q4 roadmap

* more

* more

* more

* more

* feat: add shelleck precommit hook (#488)

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: add shelleck precommit hook

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

---------

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* project: add q4 roadmap news (#495)

* fix missing shellcheck in pre-commit image (#497)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* infra: update tools (#501)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat(demo): enhance OpenShift demo scripts with improved UX (#478)

- Reduce model selection test to 4 categories (2×Model-A, 2×Model-B)
- Add new "Classification Examples" option calling curl-examples.sh
- Update reasoning examples to avoid cache hits from previous tests
- Remove benign examples from PII and Jailbreak tests (show only attacks)
- Enhance live-semantic-router-logs.sh with better color visibility:
  - Fix duplicate "WITH SCORE" text in classification output
  - Fix CACHE HIT background color extending over timestamp
  - Distinguish reasoning enabled vs disabled messages
  - Remove redundant "(standard routing)" text
  - Add background colors for Model-A/Model-B routing display

These improvements make the live demo clearer and more impactful for
presentations and demonstrations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>

* fix: fix precommit Argument list too long error (#502)

Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>

* feat: enforce milvus dial timeout if set (#503)

Signed-off-by: cryo <zdtna412@gmail.com>

* Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506)

* Initial plan

* Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Allow semantic cache similarity threshold to be set at the category level (#493)

* Initial plan

* Add category-level cache settings: enabled and similarity_threshold

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Add comprehensive tests for category-level cache settings

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update config files and documentation for category-level cache settings

- Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings
- Added comprehensive documentation section explaining category-level cache configuration
- Updated semantic cache overview and in-memory cache docs with category-level examples
- Added best practices for threshold selection and privacy considerations

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Remove duplicate code in FindSimilar functions

Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

* Update src/semantic-router/pkg/extproc/request_handler.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Revert changes from unsigned commit ae39fe2

Restored the classificationText empty check that was removed in the previous commit.

Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Allow jailbreak detection and threshold to be configured at the category level (#508)

* Initial plan

* Add category-level jailbreak detection configuration

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add documentation for category-level jailbreak settings

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation for category-level jailbreak detection

- Add category-level jailbreak configuration to jailbreak-protection.md
- Update category configuration docs with jailbreak_enabled parameter
- Add security-focused configuration example
- Update global configuration docs with category override notes
- Update README to mention fine-grained security control

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Add category-level jailbreak threshold configuration

- Add JailbreakThreshold field to Category struct
- Add GetJailbreakThresholdForCategory helper method
- Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods
- Update performSecurityChecks to use category-specific threshold
- Add 5 comprehensive tests for threshold configuration
- Update example configs with threshold tuning examples
- Update documentation with threshold configuration and tuning guidelines
- Add threshold tuning guide with recommendations for different category types

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Allow PII detection threshold to be set at the category level (#510)

* Initial plan

* Add category-level PII threshold support

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Update documentation with API integration notes

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix markdown linting issues

Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* Fix: The caller information points to the wrapper function instead of the actual call location (#518)

Signed-off-by: carlory <baofa.fan@daocloud.io>

* feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504)

* feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: run go mod tidy to clean up module dependencies

Signed-off-by: Huamin Chen <hchen@redhat.com>

* conditionally build candle cuda support

Signed-off-by: Huamin Chen <hchen@redhat.com>

* rebuild index upon restart

Signed-off-by: Huamin Chen <hchen@redhat.com>

* precommit fix

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix precommit

Signed-off-by: Huamin Chen <hchen@redhat.com>

* disable cuda build on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

* review feedback

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

* merge main to feat branch

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>

* chore: fix unit test (#527)

* chore: fix unit test

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix go vet

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* fix ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* split test-binding to two stages on ci

Signed-off-by: Huamin Chen <hchen@redhat.com>

* ignore test failure due to embeddinggemma restriction

Signed-off-by: Huamin Chen <hchen@redhat.com>

* reorder ci test sequences to avoid missing models

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

* refactor: Replace lazy_static with OnceLock for zero-cost concurrent reads based on review  (#528)

* refactor: Replace lazy_static with OnceLock for zero-cost concurrent reads based on review #266 (comment)

Signed-off-by: Huamin Chen <hchen@redhat.com>

* update tests

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: fix lint error (#530)

Signed-off-by: Huamin Chen <hchen@redhat.com>

* Fix lint error2 (#531)

* chore: fix lint error

Signed-off-by: Huamin Chen <hchen@redhat.com>

* chore: fix lint error

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>

---------

Signed-off-by: OneZero-Y <aukovyps@163.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: carlory <baofa.fan@daocloud.io>
Signed-off-by: JaredforReal <w13431838023@gmail.com>
Signed-off-by: yuluo-yx <yuluo08290126@gmail.com>
Signed-off-by: Yossi Ovadia <yovadia@redhat.com>
Signed-off-by: cryo <zdtna412@gmail.com>
Co-authored-by: OneZero-Y <aukovyps@163.com>
Co-authored-by: 杨朱 · Kiki <baofa.fan@daocloud.io>
Co-authored-by: Jared <w13431838023@gmail.com>
Co-authored-by: bitliu <bitliu@tencent.com>
Co-authored-by: shown <yuluo08290126@gmail.com>
Co-authored-by: Yossi Ovadia <yovadia@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: cryo <zdtna412@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: rootfs <7062400+rootfs@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Xunzhuo <48784001+Xunzhuo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow cache similarity to be set at the category level

4 participants