Skip to content

feat: add TextAnalyzerConfig for ASCII folding in text properties#2006

Merged
dirkkul merged 29 commits intodev/1.37from
feat/ascii-fold
Apr 14, 2026
Merged

feat: add TextAnalyzerConfig for ASCII folding in text properties#2006
dirkkul merged 29 commits intodev/1.37from
feat/ascii-fold

Conversation

@amourao
Copy link
Copy Markdown

@amourao amourao commented Apr 9, 2026

  • Added TextAnalyzerConfig with ASCII folding
  • Tests for config serialization
  • TODO: add stopword presets to TextAnalyzerConfig

Copilot AI review requested due to automatic review settings April 9, 2026 13:36
Copy link
Copy Markdown

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a TextAnalyzerConfig API to configure ASCII folding for text / text[] properties (including nested properties), and wires it through schema (de)serialization so it can be created and parsed from server configs.

Changes:

  • Introduced TextAnalyzerConfig (Pydantic) and internal _TextAnalyzerConfig (dataclass) for ASCII folding configuration.
  • Added parsing/serialization support for textAnalyzer in property and nested property configs.
  • Added unit + integration tests for serialization and config parsing/round-tripping.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
weaviate/collections/classes/config.py Adds TextAnalyzerConfig and text_analyzer support to property models/serialization.
weaviate/collections/classes/config_methods.py Parses textAnalyzer from server schema into internal config dataclasses.
weaviate/classes/config.py Re-exports TextAnalyzerConfig from the collections config API.
test/collection/test_config.py Adds unit tests for TextAnalyzerConfig serialization and validation.
test/collection/test_config_methods.py Adds tests ensuring schema parsing populates text_analyzer correctly.
integration/test_collection_config.py Adds integration coverage for creating collections with text_analyzer (including nested properties).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@amourao amourao changed the base branch from main to dev/1.37 April 9, 2026 20:45
Copy link
Copy Markdown

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 71.95122% with 92 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (dev/1.37@9bea05a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
integration/test_collection_config.py 42.20% 89 Missing ⚠️
weaviate/collections/config/executor.py 80.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             dev/1.37    #2006   +/-   ##
===========================================
  Coverage            ?   87.64%           
===========================================
  Files               ?      280           
  Lines               ?    22046           
  Branches            ?        0           
===========================================
  Hits                ?    19323           
  Misses              ?     2723           
  Partials            ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

MultiVectors = _MultiVectors
ObjectTTL = _ObjectTTL
Replication = _Replication
TextAnalyzer = staticmethod(_text_analyzer)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just add it as static method, same as inverted_index right below

Copy link
Copy Markdown
Author

@amourao amourao Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done it but my only gripe there is the snake casing vs camel casing on the top methods 😅

@amourao amourao requested a review from a team as a code owner April 14, 2026 11:29
@dirkkul dirkkul merged commit 31737e9 into dev/1.37 Apr 14, 2026
116 of 119 checks passed
@dirkkul dirkkul deleted the feat/ascii-fold branch April 14, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants