Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid - Minimum viable results - acceptance threshold - skip very bad results #4315

Open
1 task done
sebawita opened this issue Feb 26, 2024 · 7 comments
Open
1 task done

Comments

@sebawita
Copy link
Contributor

Describe your feature request

Currently, hybrid search doesn't allow me to filter out really poor results – those where both keyword and vector search provide with very poor results.

It would be great if we could provide an acceptance threshold as part of a hybrid query, where we would require at least on of them to be satisfied to return an object i.e.

  • max vector distance – accept results where distance is not more than x
  • min keyword score – accept results where the score is at least y

This way we could eliminate results that are poor on both fronts (keyword and vector).

Example:

I have two objects:

  • "a very big army helicopter"
  • "a small light civilian plane"

If I search for "sandwich", this should result in poor results on both fronts, and I would expect no results back.
While a search for "aeroplane", should result in a vector match (and no match on BM25), and I should get back the two objects back.

import weaviate, os
import weaviate.classes.config as wc

client = weaviate.connect_to_local(
    headers={
        "X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]
    }
)

test = client.collections.create(
    "Test",
    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai()
)

test.data.insert_many([
    {
        "name":"a very big army helicopter"
    },
    {
        "name":"a small light civilian plane"
    },
])

Proposed solution

import weaviate.classes.query as wq 
response = test.query.hybrid(
    "sandwich",

    # The new param to remove poor results on both fronts - at least one should be satisfied to include in the result set
    acceptance_threshold: {
        min_hybrid_score: 0.1,
        max_vector_distance: 0.2
    },

    return_metadata=wq.MetadataQuery(score=True),
)

Note

BM25 alone doesn't return if no match was made ;)

If I run:

import weaviate.classes.query as wq 
response = test.query.bm25(
    "sandwich",
    return_metadata=wq.MetadataQuery(score=True),
)

The query returns 0 objects.
So, that is already a good indicator that we rely solely on the vector search. ;)

Code of Conduct

@sebawita sebawita changed the title Hybrid - Minimum viable results - filter very bad results - acceptance threshold Hybrid - Minimum viable results - acceptance threshold - skip very bad results Feb 26, 2024
@sebawita
Copy link
Contributor Author

Note, the idea here is not to filter on the final hybrid score, but instead to filter out results based on the separate scores/distances before we get to calculate the final hybrid score.

@iamleonie
Copy link

/bounty $200

Copy link

algora-pbc bot commented May 13, 2024

💎 $200 bounty • Weaviate

Steps to solve:

  1. Start working: Comment /attempt #4315 with your implementation plan
  2. Submit work: Create a pull request including /claim #4315 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Additional opportunities:

  • 🔴 Livestream on Algora TV while solving this bounty & earn $200 upon merge! Make sure to have your camera and microphone on. Comment /livestream once live

Thank you for contributing to weaviate/weaviate!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @dresslife-shbh May 13, 2024, 12:29:58 PM #4980
🔴 @Bhavyajain21 May 14, 2024, 10:15:31 AM WIP
🟢 @hsm207 May 14, 2024, 10:18:57 AM WIP

@dresslife-shbh
Copy link

dresslife-shbh commented May 13, 2024

/attempt #4315

@Bhavyajain21
Copy link

Bhavyajain21 commented May 14, 2024

/attempt

Algora profile Completed bounties Tech Active attempts Options
@Bhavyajain21 8 bounties from 3 projects
TypeScript, Rust,
JavaScript & more
Cancel attempt

@hsm207
Copy link
Contributor

hsm207 commented May 14, 2024

/attempt

dresslife-shbh added a commit to dresslife-shbh/weaviate that referenced this issue May 20, 2024
Filtered vector and keyword search results according to threshold params in hybrid search

Fixes weaviate#4315
Copy link

algora-pbc bot commented May 20, 2024

💡 @dresslife-shbh submitted a pull request that claims the bounty. You can visit your bounty board to reward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants