Skip to content

Refactor TextSearch classes for improved clarity and semantic accuracy #2416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

annuaicoder
Copy link

Summary

This PR refactors the TextSearch and IndexedTextSearch classes to improve maintainability, accuracy, and readability. Key improvements include:

  • ✅ Introduced a shared BaseTextSearch class to remove duplicate logic
  • ✅ Replaced LevenshteinDistance with SpacySimilarity (if available) for better semantic comparisons
  • ✅ Added support for returning top-N best matches instead of yielding a single result
  • ✅ Improved logging and added detailed inline comments for clarity
  • ✅ Preserved full backward compatibility with ChatterBot’s storage filtering and APIs

Why

The previous implementation had redundant code across classes and relied solely on Levenshtein distance, which does not capture meaning. This update leverages built-in ChatterBot capabilities to improve results while making the code more maintainable.

Notes

  • Falls back gracefully if in_response_to is missing
  • Still works without external dependencies like sentence-transformers

Let me know if you’d like this to support a fallback to LevenshteinDistance if SpacySimilarity isn’t available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant