You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Consider a classification scheme where each document can be classified in multiple categories, and categories form a hierarchy and there are many. As an example, let's say the classification schema contains
When browsing a certain category (styles/fantasy) we are interested in grouping ("faceting") search results, but only showing categories that are under the current path. It is also important to provide the complete result set.
Describe the solution you'd like
The preferred option would be to add a grouping function that can filter values that do not start with a certain prefix. So for example:
Would discard all groups whose value does not start with "styles/fantasy/"). As with other expressions the computation would occur at each node, and so network bandwidth would be greatly reduced.
filter_prefix might completely omit the group, or replace the value with an empty string (both would solve the problem) or a string selected by the user. For example:
A first approach is to group by all values (all( group(category) each(output(count())) )), and then filter out the ones that don't belong to the current context. But this may require a very large maxHits to assure that the values of interest are actually included in the results, and it will be inefficient. On large taxonomies it'll make hard to provide assurances that the result set is complete.
Creating one field for each level ("category1", "category2", "category3") attenuates but does not solve the problem since documents can be in multiple categories at different hierarchy points; so we are still at risk of not providing the complete result set.
A more general but maybe less efficient approach would be allow regex filtering
Is your feature request related to a problem? Please describe.
Consider a classification scheme where each document can be classified in multiple categories, and categories form a hierarchy and there are many. As an example, let's say the classification schema contains
And we may have docs classified in multiple categories, possible at the same level.
When browsing a certain category (styles/fantasy) we are interested in grouping ("faceting") search results, but only showing categories that are under the current path. It is also important to provide the complete result set.
Describe the solution you'd like
The preferred option would be to add a grouping function that can filter values that do not start with a certain prefix. So for example:
all( group(filter_prefix(category, "styles/fantasy/")) each(output(count())) )
Would discard all groups whose value does not start with "styles/fantasy/"). As with other expressions the computation would occur at each node, and so network bandwidth would be greatly reduced.
filter_prefix
might completely omit the group, or replace the value with an empty string (both would solve the problem) or a string selected by the user. For example:all( group(if_starts(category, "styles/fantasy", category, "alternative")) each(output(count())) )
Describe alternatives you've considered
A first approach is to group by all values (
all( group(category) each(output(count())) )
), and then filter out the ones that don't belong to the current context. But this may require a very large maxHits to assure that the values of interest are actually included in the results, and it will be inefficient. On large taxonomies it'll make hard to provide assurances that the result set is complete.Creating one field for each level ("category1", "category2", "category3") attenuates but does not solve the problem since documents can be in multiple categories at different hierarchy points; so we are still at risk of not providing the complete result set.
A more general but maybe less efficient approach would be allow regex filtering
all( group(if_regex_matches(category, "styles/fantasy", category, "alternative")) each(output(count())) )
A new expression syntax rather than a function would may be more natural, but probably requires more aggressive changes. For example:
all( group(category) if_prefix("styles/fantasy") each(output(count())) )
Additional context
See originating discussion on: https://vespatalk.slack.com/archives/C01QNBPPNT1/p1654876998447789
The text was updated successfully, but these errors were encountered: