-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EDS and EPF: Use orFacets setting for API #3822
Conversation
// all copied for draft purposes from Base/Params | ||
protected $orFacets = []; | ||
|
||
// protected function initFacetList($facetList, $facetSettings, $cfgFile = null) | ||
public function initFacetList($facetList, $facetSettings, $config) | ||
{ | ||
// $config = $this->configLoader | ||
// ->get($cfgFile ?? $this->getOptions()->getFacetsIni()); | ||
if (!isset($config->$facetList)) { | ||
return false; | ||
} | ||
if (isset($config->$facetSettings->orFacets)) { | ||
$orFields | ||
= array_map('trim', explode(',', $config->$facetSettings->orFacets)); | ||
} else { | ||
$orFields = []; | ||
} | ||
foreach ($config->$facetList as $key => $value) { | ||
$useOr = (isset($orFields[0]) && $orFields[0] == '*') | ||
|| in_array($key, $orFields); | ||
$this->addFacet($key, $value, $useOr); | ||
} | ||
|
||
return true; | ||
} | ||
|
||
public function addFacet($newField, $newAlias = null, $ored = false) | ||
{ | ||
// if ($newAlias == null) { | ||
// $newAlias = $newField; | ||
// } | ||
// $this->facetConfig[$newField] = $newAlias; | ||
if ($ored) { | ||
$this->orFacets[] = $newField; | ||
} | ||
} | ||
|
||
public function getFacetOperator($field) | ||
{ | ||
return in_array($field, $this->orFacets) ? 'OR' : 'AND'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So obviously I don't really want to duplicate all of this logic from Base/Params. But I could not find a clean way to make the Params object available to SearchRequestModel. Hoping for some advice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been quite a while since I've looked at the EDS code, so maybe there's an obvious answer to this question... but why do you want to have this logic in the SearchRequestModel at all? I think the way this is supposed to work is that the Params class constructs a ParamBag which gets sent to the backend, and then the backend turns the ParamBag into a SearchRequestModel for the EDS code to work with. Is there a reason we can't do all the processing directly in Params and then just pass the processed data all the way through? Or are there code paths that go straight to the backend without being processed by Params first?
(I can take a deeper look at the code if you need me to -- but that's my first impression based on memory and no code reading :-) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, all this (duplicate) processing is just so that I can call getFacetOperator
here in SearchRequestModel, which is necessary to build the (completely different) structure for AND vs OR. Which, I don't like. With your point about keeping the logic in EDS/Params, I'm thinking I should change AbstractEDSParams->createBackendFilterParams() to generate filters that look like field:operator:value
instead of field:value
. And then parse the operator in SearchRequestModel->addfilter(). I would still have to store that AND/OR state internally somewhere in SearchRequestModel until I actually format the EBSCO calls, but that's an improvement. Let me know if that sounds way off, if not I will build it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's exactly the sort of thing I had in mind -- this code is only ever used for EDS, and its purpose is to map the VuFind standard structures to something useful for EDS, so I think it makes perfect sense to pass along any and all needed data at this stage in order to simplify later steps.
If it's not overly difficult, you should probably aim to build things in a backward-compatible way -- e.g. if you're adding field:operator:value
format, the subsequent code should still handle field:value
format by inserting a default operator value -- just in case somebody has written custom code that talks directly to the EDS backend. I don't think it's very likely, but if we want to include this in 10.1 instead of 11.0, we should make a greater effort to avoid breaking things. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I did this refactoring (including backwards compatibility) and I think it's a lot better now.
* | ||
* @var array | ||
*/ | ||
protected $FORCED_OR_FIELDS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is conversation about this in the VuFind slack from May 10th, with @cwolfebsco and @berndoberknapp. I'm duplicating some of it here since that slack thread will expire in about a month.
Claus Wolf
2 months ago
SourceTypes should be mutually exclusive, i.e. a book cannot be an article, so would we truly expect an AND to be possible here? The only SourceType I can think that are no mutually exclusive are books and ebooks
Bernd Oberknapp
2 months ago
No, but this is also the case for other facets like ContentProvider and Journal. I would expect all facets to behave in the same way, and if the client requests AND the result should be empty.
Claus Wolf
2 months ago
I'll be happy to pass this feedback to our development teams. ContentProvider & SourceType have always been somewhat special, even in the native UI. I'll write it up and provide you with an Enhancement Request Number.
Maccabee Levine
2 months ago
Thank you Claus for the help. I agree with Bernd that the behavior is unexpected (and worse, I think it's undocumented)
Claus Wolf
2 months ago
This has been recorded as Enhancement Request ER 3366702. I did copy the Product Manager for the API for his awareness
I'm intentionally not resolving the style issues (missing function comments) since the hope is to get rid of those duplicate functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @maccabeelevine -- see below for some comments and answers.
* | ||
* @var array | ||
*/ | ||
protected $FORCED_OR_FIELDS = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a compelling reason to name this property in ALL_CAPS_SNAKE_CASE? That's usually reserved for constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I meant it as a "constant", in the sense of one that will be defined more specifically in the EDS-specific subclass, and potentially later in the EPF one as well, but should exist for either based on the mysterious logic behind the scenes of the APIs. See my comment below about EDS/Params. I defined it in the abstract class so it can be used by the common parseOperatorAndFieldName
. If it's cleaner, it could be an abstract function getForcedOrFields
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong feeling about whether it should be a property or a method -- just that if it's a property, I think it should be $forcedOrFields
instead of $FORCED_OR_FIELDS
to reflect that it's not really a constant. (I think you could also define it as const FORCED_OR_FIELDS
but then you would of course be unable to change it. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I changed the capitalization since it's not technically a constant. The user has no real control over it since it's up to the API, but there might be reasons to change it anyway. Especially if we are misunderstanding the API in some way and find that out later, which in the case of EPF at least is quite likely.
// all copied for draft purposes from Base/Params | ||
protected $orFacets = []; | ||
|
||
// protected function initFacetList($facetList, $facetSettings, $cfgFile = null) | ||
public function initFacetList($facetList, $facetSettings, $config) | ||
{ | ||
// $config = $this->configLoader | ||
// ->get($cfgFile ?? $this->getOptions()->getFacetsIni()); | ||
if (!isset($config->$facetList)) { | ||
return false; | ||
} | ||
if (isset($config->$facetSettings->orFacets)) { | ||
$orFields | ||
= array_map('trim', explode(',', $config->$facetSettings->orFacets)); | ||
} else { | ||
$orFields = []; | ||
} | ||
foreach ($config->$facetList as $key => $value) { | ||
$useOr = (isset($orFields[0]) && $orFields[0] == '*') | ||
|| in_array($key, $orFields); | ||
$this->addFacet($key, $value, $useOr); | ||
} | ||
|
||
return true; | ||
} | ||
|
||
public function addFacet($newField, $newAlias = null, $ored = false) | ||
{ | ||
// if ($newAlias == null) { | ||
// $newAlias = $newField; | ||
// } | ||
// $this->facetConfig[$newField] = $newAlias; | ||
if ($ored) { | ||
$this->orFacets[] = $newField; | ||
} | ||
} | ||
|
||
public function getFacetOperator($field) | ||
{ | ||
return in_array($field, $this->orFacets) ? 'OR' : 'AND'; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's been quite a while since I've looked at the EDS code, so maybe there's an obvious answer to this question... but why do you want to have this logic in the SearchRequestModel at all? I think the way this is supposed to work is that the Params class constructs a ParamBag which gets sent to the backend, and then the backend turns the ParamBag into a SearchRequestModel for the EDS code to work with. Is there a reason we can't do all the processing directly in Params and then just pass the processed data all the way through? Or are there code paths that go straight to the backend without being processed by Params first?
(I can take a deeper look at the code if you need me to -- but that's my first impression based on memory and no code reading :-) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @maccabeelevine, this seems to be coming together well. I'll try to do some hands-on testing and deeper review later this week. In the meantime, a couple of questions about the .ini file changes.
config/vufind/EDS.ini
Outdated
orFacets = * | ||
; Note that ContentProvider and SourceType are forced to be OR facets, ignoring | ||
; this setting, due to special handling by the EDS API. | ||
; orFacets = * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you changed the default advanced search behavior, or that you only put the "special handling" note in the advanced section but not in the regular section? I think it might be worth repeating, and I don't think we want to change the advanced facet behavior (unless there's a conscious reason why you did).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was mostly a mistake! That said I haven't checked the advanced search behavior yet, will add a to-do. I'm not sure why the prior default was different for basic vs. advanced, and I'm not sure if backwards compatibility matters when the point of the PR is to make the orFacets logic actually work for the first time -- BUT maybe it actually did work for advanced, who knows. TBD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Solr backend, advanced search uses OR facets so the user can multi-select values from the advanced form, which is a bit more flexible and useful than picking only one value in that particular interface. It's likely that when EDS was set up, those Solr defaults just got copied over without much thought, and nobody noticed that they didn't actually work as expected, because advanced search faceting is not a heavily used feature. If we can make things work correctly, it probably makes sense to continue to have parallel defaults between Solr and EDS.
@maccabeelevine, it also looks like some tests may need adjustment now... and is your TODO item still an open question? Let me know if you need any help hunting down an answer there. |
Yup some things still in progress...will let you know when it's ready for a re-review. Thanks! |
And LIMIT also, not sure that it's needed but can't hurt.
module/VuFind/tests/unit-tests/src/VuFindTest/Search/Blender/ParamsTest.php
Show resolved
Hide resolved
Not ready for a re-review, BUT there is an issue worth input from @demiankatz @cwolfebsco or others (@crhallberg ?). The values displayed next to each facet, which come from the EDS API, are the number of results you would get if you add that facet value with AND. In this screenshot that facet is configured for OR, so there should be a way to get EDS to tell me how many matches there would be if I add a particular value as an OR. I asked EDS support and they responded:
So, not sure what to do. Should we simply hide the values for a facet that we've configured for OR? Or some other option I'm not considering? |
@maccabeelevine The Enhancement Request that your support case was attached to, sadly doesn't actually express the request you had, but was the ER I had originally created for better documentation. Anyhow, either I am totally getting the wrong end of the question, or there is a misunderstanding. If I get it totally wrong, plesae forgive me: Adding facet values will always narrow the existing search, however the facet group behavior determins the how. Simplistically speaking: original_search AND (facet_value_a AND facet_value_b) To demonstrate this, please see the below example for the search for otzberg (a "mountain" about 1 mile from here) basic search::
total hits: 330 SubjectEDS Now let's use the AND group
total hits: 1 Only one result of the original search, had BOTH geology & odenwald as SubjectEDS vs.
Of the original results: 21 results had either geology OR odenwald, but 1 had both If you consider this - the values that are shown are only true when you add 1 facet to the search. They are incorrect either way if you add several facetValue to the search. If you OR terms together you are likely seeing at least as many result as one of the values, but if you AND you will at most get as many results as one of the values, but likely fewer and possibly 0. Since Facets will always narrow the search, the facet counts should be seen more like a statistics: The currently shown search result contain 11 records with the term "odenwald" and 11 records with the term "geology". If you click one of them you are going to see these 11 records, if you add multiple... well then it gets real difficult, real hard. |
@cwolfebsco, the way Solr handles this is that if a facet is requested in "OR" mode, the facet counts for that field always reflect the counts within the result set if that field is unfiltered, so the numbers consistently show how many records will get toggled on or off when that facet is selected. I think this makes sense, and is probably the only sensible way to handle counts in this mode. So, for example, if you have these records with these formats: Record 1: Book, Electronic And you apply the "Book" format facet, in AND mode, your counts will be: Book: 3 But in OR mode, no matter what combination of filters you apply, your counts will ALWAYS be: Book: 3 Note that these unfiltered counts are only on a field-by-field basis -- so if I were to apply, say, an author facet, that would limit down the counts in my format field. But applying a single format won't change the counts of the other formats, because we need to be able to see the whole list and know what happens if we OR in more. If the EDS API can't natively provide this functionality, we could simulate it by making multiple API calls, but that would get complicated and expensive very quickly -- for each field containing an applied filter, we would have to run a separate query without the filter applied, and then somehow mash together all the facet counts. Not something I would recommend trying, even though I'm pretty sure it would work (if very slowly). Barring a better technical solution, I think the best we can do is follow @maccabeelevine's suggestion of hiding the counts entirely, but that's not really satisfying either. |
Thanks both. I am kind of tempted by @cwolfebsco 's reasoning that
and not necessarily reflective of its boolean operator. But as @demiankatz says, that's not how it works for the Solr backend, and given the context of Solr's primary role in VuFind I don't think it makes sense to have a different (hidden) rule for how facets work with other backends. That said, this then also raises the question of whether there are already any other backends that have different behavior with these counts. If so, we already have an inconsistency. And if we take my suggestion (which it looks like @demiankatz sadly agrees with) to hide the facet counts for EDS), then I think we also have to consider how that bubbles up to Blender, but I guess it would have to hide those counts too. If that's the best approach, I'll pursue it. But I'm starting to wonder if the whole counts issue needs a separate investigation. |
@berndoberknapp, that's good (if frustrating) to know. So maybe we're back to needing an "approximate" marker, or maybe we just need a footnote, like put an asterisk next to every number and link that to an anchor that explains that facet counts may not exactly match result set sizes. Or maybe we just don't worry about it because most people care less about these details than we do. I'm not sure what's best! |
I think we're talking about two scenarios here, although they both lead to wrong numbers. I've noticed what @berndoberknapp points out about the facet numbers being inexact. I didn't notice whether they were higher or lower but I'm sure he is correct that they are higher than the real numbers. But that's the case for AND facets, where the number is at least an approximation. For OR facets, the number given is completely wrong, and much lower than the real numbers, because EDS has no way to know that I want the OR number (the union of two facet values), and it's giving me the AND number (the intersection) instead. But I agree that my temporary "+" solution is even worse given @berndoberknapp 's point that often the real value will be lower rather than higher. So I think I am back to ripping out all of the facet value code and agreeing that
And then dealing with the facet values in a separate PR, once we have guidance from @cwolfebsco 's contact. |
…API does not support AND
As discussed above, I've removed all of this PR's attempts at facet count logic. We can address that in a later PR once we hear EBSCO's recommendations. If OR facets work but give wrong counts, that's still a net improvement then them not respecting the orFacets configuration at all. I've also resolved the question of advanced search, by removing all references to a separate advanced screen orFacets configuration from EDS.ini. The EDS advanced search screen has no facets per se; it has several "limiters". And unlike filters, the EDS API does not appear to support AND mode for the facet values of any limiters -- for example, multiple languages selected will always be OR'ed. So there is no need for an orFacets configuration in advanced search. Almost ready for a re-review, I'm just considering unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed on Slack this morning, the language OR facets are not working for me -- it still appears that only the first selected value gets applied, rather than boolean logic being used.
I did manage to fix the problem we noticed during troubleshooting with "edit advanced search" failing to repopulate the selected facet values correctly. My solution isn't beautiful, but I think it should work well enough for our purposes. I've pushed up a commit!
Please let me know if you need anything else from me; otherwise, I'll wait to hear back from you before reviewing/testing further.
Ok, my bad, I had this working earlier when I had rewritten SearchRequestModel's limiters support to account for both the OR and the AND syntax. Then when it turned out there was no support for AND, I ripped out that code and figured the existing code would handle OR properly. Which it doesn't. So I put back the new OR support, and it seems to work fine now. Please retest. Your fix looks good as well. |
@maccabeelevine, the advanced facets seem to be working as expected for me now; thanks for that! However, while testing the regular search results orFacets setting, I think we still have a problem. The issue is not just that we get the wrong counts for the facets -- we get entirely the wrong LIST of facets. So imagine that I see three facet values that I want -- as soon as I click one of them, unless there are records with overlapping values, the other two values are removed from the set and can no longer be selected. Even if there are overlapping values, the counts will change and everything will move around so that I may no longer be able to easily find the next value that I want. Obviously, the best solution here would be for the API to behave the way we want it to behave. Failing that, I can think of a few possible options, all of which are either incomplete solutions or bad ideas: 1.) There is current work in progress in #3761 to make it possible to select multiple OR facet values before submitting a request, which potentially provides a smoother user experience, and which would also reduce (but not entirely eliminate) the issues caused by the behavior of EDS. That PR still needs a lot of work based on the most recent review, but I think it's a good idea that's worth refining. If it would help solve the problems here, maybe that's a sign that we should invest some effort into helping get it completed and polished sooner rather than later. 2.) I could imagine a system where we cache OR facet lists in the session -- so when a user first performs a search, we store all the facet values and counts, and then when they perform an OR facet search, we merge the lists so that the values and counts from the original search are provided, but the selected values from the current response are applied. This is obviously not something I'm eager to do -- it could potentially use a lot of session storage, it would make for complicated logic, and getting the cache invalidation right would be tricky. Indeed, the more I think about it, the more I realize that the idea becomes infeasible as soon as you have to manage more than one OR facet field at the same time due to the way facets interact with one another. It might help in some of the most common scenarios, but it's probably not worth the complication. 3.) As we've discussed before, you could solve this problem by making multiple API calls, but there are multiple reasons why that is a bad idea (complexity of merging behavior, but more importantly, poor performance which gets worse and worse the more facet fields you use). What a pain! |
@maccabeelevine, a couple other thoughts: First of all, the special "forced OR fields" (ContentProvider and SourceType) do actually work the way they are supposed to. That may be useful for communicating with EBSCO, since we can demonstrate the differences in behavior entirely within the scope of the existing API. Until we can find a way to get more robust and consistent behavior out of the other fields, I'm wondering if we should disable the EDS orFacets behavior, but merge the other changes here (the forced OR facets mechanism, which ensures that facets with proper OR behavior are always displayed correctly, and the advanced search fixes, which are also important). Alternatively, we can keep things as-is but put some comments on orFacets explaining the problems and limitations, so people can make their own decisions about whether the trade-offs are worth it. Let me know what you think! |
@demiankatz I've added some additional code to handle OR coming in the I hear what you are saying about maybe removing that functionality for now, and I can't push back hard on that, but I'd rather leave it in. It's effectively disabled by default since orFacets is commented out, and the comments in the ini make it an 'add at your own risk' feature. I also like what you referenced in #3761, which will make EDS orFacets less problematic, so I agree we should prioritize that for 10.1 to go along with this PR. (I agree that the caching solution and the multiple API calls idea have too much downside.) At the same time, leaving the functionality there may help increase the visibility of the problems with the API and help folks to demonstrate / test them and ultimately verify any fixes that are forthcoming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @maccabeelevine, I think you're right that the comment covers us for now given the existing defaults... we'll just have to remember to revise it if the API improves in the future or if we discover a different way to solve the problem.
I just spotted one likely typo. I'll try to find time to give this one more thorough test and review in the next couple of days, and then it can hopefully be merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, @maccabeelevine! I think this is ready to go now.
There is no way to configure the behaviour of the facets in EBSCOadmin, therefore I assume that the EBSCO support won't be able to change the behaviour either. Therefore #3761 seems to be the only feasible approach for implementing orFacets with the current EDS API. |
@berndoberknapp Agreed, I'm not expecting EBSCO support to "fix" this, but @cwolfebsco is helping us get them to understand the issue and then presumably get an enhancement request going. |
@cwolfebsco noted today that an EBSCO enhancement request has been created for the OR facet behavior: ER# 3538330 |
- Also includes fixes to several facet-related bugs.
Formerly the orFacets configuration in EDS.ini (and EPF.ini) was used only to control the display. This PR makes use of that parameter to actually format the API request appropriately to treat the given facet as OR or AND.
TODO