Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exclusions to aggregation #26627

Open
eostis opened this issue Mar 29, 2023 · 23 comments
Open

Add exclusions to aggregation #26627

eostis opened this issue Mar 29, 2023 · 23 comments
Assignees
Milestone

Comments

@eostis
Copy link

eostis commented Mar 29, 2023

E-Commerce facets often require to display "excluded" facet items from current selection.

For instance, let's say we selected the "Shoe" category. Each shoe has exactly one colour: blue or red.

If we select the blue filter, the red filter disappears, no way to also display red shoes:
Color
 X Blue

But if we want to show the red excluded filter also, with a "OR" on the colour filter:

  1. Show only blue shoes, but display the red excluded filter
    Color
     X Blue
      Red

  2. Show blue or red shoes
    Color
     X Blue
     X Red

(See https://solr.apache.org/guide/solr/latest/query-guide/faceting.html#tagging-and-excluding-filters)

@bratseth
Copy link
Member

The grouping result is generated from the matches.
If you want to generate a grouping result covering the complete universe of possible choices you can do that by sending another query just for grouping (hits=0), with the query "true".

Maybe it could be worth while to have a built-in Searcher that can do this.

@eostis
Copy link
Author

eostis commented Mar 30, 2023

Exclusion is applied to the current color filter, but other filters must be applied. Which means we would build a second query with the same filters but the color filter to get the excluded color facet values.

And we must do that for each excluded facet (color and size for instance). There can be tens of excluded facets, so performance is an issue, and all excluded queries must be called in parallel (same as #26579)

I will do just that for my WooCommerce plugin. But it could be nice to build it in Vespa's yql syntax, as this is a very common e-commerce requirement.

Exclusion seen on a world-famous e-commerce site:
image
image
image

@bratseth
Copy link
Member

Why can't you do one query with no exclusions to get all these alternatives?

@eostis
Copy link
Author

eostis commented Mar 30, 2023

Because exclusion is only among a facet items, not between facets.

Here is the original query with 2 non-excluded facets color and size, returning one facet color and one facet size:
select * from my_type where (post_type matches "product") and (color matches "blue") and (size matches "xxl") all( all(grouping(size) each(output(count())) all(grouping(color) each(output(count())) )

Here is the query for excluded size, returning all sizes for the blue color:
select * from my_type where (post_type matches "product") and (color matches "blue") all(grouping(size) each(output(count()))

Here is the query for excluded color, returning all colors for the xxl size:
select * from my_type where (post_type matches "product") and (size matches "xxl") all(grouping(color) each(output(count()))

@bratseth
Copy link
Member

In your example picture above is a gui which (I believe) shows all the options of each facets. That can be produced by one query, recalling all documents (or all documents matching the part of the query not set by facet selection).

It also shows some facet values being selected, which leads to some choices of other facets being deactivated. The information needed for this is also one query, adding all selected facets as filters and deactivating all facet values not present in the resulting grouping.

What am I missing?

@eostis
Copy link
Author

eostis commented Mar 30, 2023

Exactly: it would require one query (no exclusion) + one query per excluded facet with a selection

In my previous example:

  • Visitor visits the catalog: 1 query
  • Visitor clicks on a non-excluded facets: 1 query
  • Visitor clicks on color facet: 2 queries (non excluded, color excluded)
  • Visitor clicks on size facet: 3 queries (non excluded, color excluded, size excluded)

And so on if more excluded facets are selected.

@bratseth
Copy link
Member

Yes, it will be one query to respond to each new action taken by the user. That is normal and as expected right?

And if we assume no state in the frontend, we'll need also need to redo the query without selection, so we get 2 queries per new user action. But we never need more than 2 queries at most per user action - agreed?

@eostis
Copy link
Author

eostis commented Mar 30, 2023

No, because you must count excluded facets selected previously.

  • No selection: 1 query
  • Add First excluded facet selection: 2 queries
  • Add 2nd excluded facet selection: 3 queries
  • ...
  • Add Nth excluded facet selection: N+1 queries

Total: for the whole sequence (1+ 2 + 3 + .. + 1 + N) = 1+ N*(N+1)/2 queries (instead of 1+N with non-excluded facets)

The examples above with color and size required 3 queries, because both color and size where selected.

@bratseth
Copy link
Member

Yes, N user actions give N+1 queries in total (not counting the first user action of opening the page/app). Agree?

@eostis
Copy link
Author

eostis commented Mar 30, 2023

No, the last action gives N+1 queries.

Total actions give O(N*N) queries

  • No selection: 1 query
  • Select color: adds 2 queries
  • Also select size: adds 3 queries
  • Also select textile: adds 4 queries

Total: 1 +2 + 3 + 4 = 10 queries

@bratseth
Copy link
Member

Okay, back to the original question:

Why does selecting 3 facets take 4 queries (assuming no state in the client)?

  • You get all choices for all facets by issuing one query selecting everything.
  • You get the results for the current selection (of one or more choices of 3 facets), plus information about which choices to deactivate (those that are not present in the facets results of this query), from one query.

What additional information do you need at this point which is not returned by these two queries?

@eostis
Copy link
Author

eostis commented Mar 30, 2023

  • You get all choices for all facets by issuing one query selecting everything.

More precisely, you get all choices for all non-excluded facets

  • You get the results for the current selection (of one or more choices of 3 facets), plus information about which choices to deactivate (those that are not present in the facets results of this query), from one query.

We need to get the 3 aggregations, but each of them requires different where filters (same filters for each, without the current selection). Which gives 3 queries.

Notice the filters below: the non-excluded filter remains, while each filter on the aggregation is removed:
select * from my_type where (post_type matches "product") and (color matches "blue") all(grouping(size) each(output(count()))
select * from my_type where (post_type matches "product") and (size matches "xxl") all(grouping(color) each(output(count()))

@bratseth
Copy link
Member

We need to get the 3 aggregations

Why, i.e what information do you gain from this that is missing otherwise?

@eostis
Copy link
Author

eostis commented Mar 30, 2023

Because the 3 aggregations cannot be calculated with the same where filters. Each one is calculated with the facet selection excluded (removed) from the where filters.

Original query: (facet1 matches "value1") and (facet2 matches "value2") and (facet3 matches "value3")

Gives:

  • (facet2 matches "value2") and (facet3 matches "value3") to get all facet1 values
  • (facet1 matches "value1") and (facet3 matches "value3") to get all facet2 values
  • (facet1 matches "value1") and (facet2 matches "value2") to get all facet3 values

@bratseth
Copy link
Member

The first query (matching all) will give you all values of facet1, facet2 and facet3.

@eostis
Copy link
Author

eostis commented Mar 30, 2023

The (matching all) query gives all values for facet1, facet2, facet3.

But not all values for:

  • facet1, when facet2 and facet3 are selected
  • facet2, when facet1 and facet3 are selected
  • facet3, when facet1 and facet2 are selected

@bratseth
Copy link
Member

Yes, but those are subsets. You need the subset matching the current selection (of facet1,2,3), but what do you need the other subsets for?

@eostis
Copy link
Author

eostis commented Mar 30, 2023

Let's go back to more precise examples.

id: 1

  • title: "sneaker 1"
  • post_type: "product"
  • status: "published"
  • color: "blue"
  • size: "10"
  • material: textile

id: 2

  • title: "sneaker 2"
  • post_type: "product"
  • status: "published"
  • color: "blue"
  • size: "11"
  • material: leather

id: 3

  • title: "sneaker 3"
  • post_type: "product"
  • status: "published"
  • color: "red"
  • size: "10"
  • material: textile

id: 4

  • title: "sneaker 4"
  • post_type: "product"
  • status: "published"
  • color: "red"
  • size: "11"
  • material: leather
  1. Filter color("blue") and size("10"), no exclusions:
    select * from my_type where (post_type matches "product") and (status matches "published") and (color matches "blue") and (size matches "10") all( all(grouping(color) each(output(count()))) all(grouping(size) each(output(count()))) all(grouping(material) each(output(count()))) )
Color
- X blue (1)
Size
- X 10 (1)
Material
-   Textile (1)
  1. Filter color("blue") and size("10") with exclusion on color:
    select * from my_type where (post_type matches "product") and (status matches "published") and (size matches "10") all( all(grouping(color) each(output(count()))) )
Color
- X blue (1)
-   red (1)
  1. Filter color("blue") and size("10") with exclusion on size:
    select * from my_type where (post_type matches "product") and (status matches "published") and (color matches "blue") all( all(grouping(size) each(output(count()))) )
Size
- X 10 (1)
-   11 (1)

Let's combine aggregations from (1) (2) (3), to see expected excluded color(red) and size(11):

Color
- X blue (1)
-   red (1)
Size
- X 10 (1)
-   11 (1)
Material
- Textile (1)

Results and non-excluded facet aggregations are coming from query (1). Query (2) and (3) are there only to calculate aggregations on excluded facets.

@bratseth
Copy link
Member

Results and non-excluded facet aggregations are coming from query (1)

Yes.

Query (2) and (3) are there only to calculate aggregations on excluded facets.

Can't you replace 2) and 3) by

select * from my_type where (post_type matches "product") and (status matches "published") all( all(grouping(color) each(output(count()))) all(grouping(size) each(output(count()))) all(grouping(material) each(output(count()))) )

This returns

Color
-  blue (2)
-  red (2)
Size
-  10 (2)
-  11 (2)
Material
- Textile (2)
- Leather (2)

You also have

Color
- X blue (1)
Size
- X 10 (1)
Material
-   Textile (1)

From 1).

Now you can combine them to create a view like in the image above:

Color
-  X blue
-    red
Size
-  X 10
-    11
Material
- Textile
- (Leather)

That said, I think there is something in that image (but not in your example) that you won't get by 2 queries:
You won't get the information necessary to differentiate between other alternatives of facets that are selected that are selectable given the choices in other facets, that is the difference between selectable and non-selectable shoe sizes in that image.

@eostis
Copy link
Author

eostis commented Mar 30, 2023

Now you can combine them to create a view like in the image above

The result should give - red (1), but the (match all )query gives - red (2) while the original query does not return red at all. How to produce - red (1) from both?

But my original feature request is to have Vespa manage the whole thing server-side, from a YQL syntax addition. My examples are just here to explain what happens, and how one could do it client-side in the meantime.

From my previous implementations, here are some hints:

  • (syntax) For Apache Solr, local queries with a specific exclusion syntax for facets. Requires a single call. Would be my preferred choice.

  • (syntax) For Google Retail search, excluded_filter_keys syntax. Requires a single call.

  • (trick) For Elasticsearch/OpenSearch, post filters. Requires a single call.

  • (trick) For Algolia, filterOnly facets. Requires a single call.

  • (trick) For Weaviate, send excluded aggregations (same discussed above) and the original query. Requires a single call containing a batch of queries. I'll probably do the same client-side, within async http2 multiplexing calls similar to Batch mode for document APIs #26579

@bratseth
Copy link
Member

Yes, 2 Red instead of 1 is an example of the same missing info (however, in that case I don't think it's necessarily better to show 1 than 2 if you show numbers at all since it may not be obvious to users what that number means.)

Anyway, this is clear to me now, thanks for the patience!

@eostis
Copy link
Author

eostis commented Sep 12, 2023

Closing inactive issue.

@eostis eostis closed this as completed Sep 12, 2023
@bratseth
Copy link
Member

On our backlog.

@bratseth bratseth reopened this Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants