Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue/vivo 3606 : add language-specific sorting and label fields to search index #321

Merged
merged 13 commits into from Aug 23, 2022

Conversation

litvinovg
Copy link
Contributor

@litvinovg litvinovg commented Aug 5, 2022

Issue VIVO-3606:
This PR is a successor of this PR

VIVO-Solr PR (companion PR, but it still works without it )

What does this pull request do?

  • Populates a sort field and label field in the search index for each language tag found in the list of labels for an individual.
  • If RDFService.languageFilter = true in runtime.properties, the sort field that corresponds to the current locale is used to sort the individual lists in the VClassGroup-based browse pages (People, Organizations, Events, etc.). The original nameLowercasedSingleValue field is used as a secondary sort in case this field is not populated.
  • alpha (A*) browse lookups search either for documents where the locale-specific sort field starts with the selected letter OR documents where the locale-specific sort field does not exist at all but the default nameLowerCaseSingleValued starts with the selected letter. This should prevent content from disappearing entirely if the locale-specific sort field is not available, though it may mean that individuals appear under the wrong letter (unchanged from current behavior).
  • When displaying the results of an autocomplete request, the label corresponding to the current locale is displayed if available. If not, the original field nameRaw is used instead.

Note that this approach is intended only to enable the minimum sorting / autocomplete functionality needed by production i18nized sites. It has the following key limitations:

  • There is only one level of fallback (e.g. from de-DE_label_sort to nameLowercasedSingleValue). There is no attempt to try de or de-AT if de-DE is not available, which means that improperly sorted values may appear if the appropriate label is not available. Similarly, content may continue to appear under the wrong alpha heading.
  • Because all sort fields use the dynamic string type, there is no language-specific collation enabled. All languages will sort based on (lowercased by solr schema configuration) unicode values and not by more complex rules. It would be nice to add this in the future, but will require the ability to modify the Solr schema according to the languages in use.

What's new?

  • SelectQueryDocumentModifierDynamicTargetField extends SelectQueryDocumentModifier for queries that performs queries for each of locales found in runtime properties and return search index fields with names composed of locale + targetSuffix.
  • Two new document modifiers are added to home/rdf/display/everytime to populate the i18nized sort and label fields.
  • AutocompleteController/IndividualListController/SearchQueryUtils are modified to take advantage of the new fields.
  • If solr schema doesn't contain fields with "_label_sort" and "_label_display" suffixes then dynamic fields with standard string type will be created at SolrSearchEngine startup. It will fix label display and label sort on instances where old solr schema is used (still adviced to be updated ).

How should this be tested?

  • In runtime.properties, enable RDFService.languageFilter = true and add en_US, es, fr_CA, and de_DE as the selectable locales.
  • Load sorttest.n3.txt and sorttest_2.n3.txt (attached).
  • Switch between the four locales and observe that the items in the People tab sort and alpha-filter properly, displaying the language-appropriate label in parentheses except in the case of 'Yanny' when de_DE is selected. In the latter case, Yanny (en-US) will be displayed instead.
  • Observe that Yanny is still browsable on the People tab when the locale is set to de_DE, even though there is no de-DE label for the individual.
  • Add a publication to the DB. Add an author. Autocomplete on the author names 'Alpha', 'Bravo', 'Charlie' or 'Delta'. Note that all 4 individuals are returned when you type one of these names. This is because the autocomplete is not language-specific, and is out of scope for this PR. The improvement with this PR is that the labels in the autocomplete dropdown will change according to your currently-selected locale.
  • Verify that individuals without labels for specified languages still sorted correctly.
  • Try load VIVO with old and new solr schemas and run rebuild search index. With old solr schema default dynamic fields should be created. With updated solr schema dynamic fields provided by solr schema should be used.

Interested parties

@VIVO-project/vivo-committers

sorttest.n3.txt
sorttest_2.n3.txt

@litvinovg litvinovg requested review from chenejac, a user, hudajkhan and gneissone August 5, 2022 09:05
@litvinovg litvinovg marked this pull request as ready for review August 5, 2022 09:07
chenejac
chenejac previously approved these changes Aug 17, 2022
Copy link
Contributor

@chenejac chenejac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@litvinovg can you please read my comments before we merge your PR. Great contribution!!!

Copy link
Contributor

@chenejac chenejac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

Copy link
Contributor

@matthiasluehr matthiasluehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these changes and they worked in my development environment. Individuals appeared on the expected index pages dependent on the selected language. Autocompleter was also aware of different languages and multiple labels. Tested with provided sample data as well as some own examples.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@chenejac chenejac merged commit df3c4a8 into vivo-project:main Aug 23, 2022
ghost pushed a commit that referenced this pull request Feb 23, 2023
…earch index (#321)

* Add select query document modifier with dynamic target field; use locale-specific sort fields when available.

* Add i18nized labels to index for autocomplete

* Remove lowercasing from label query

* Improved document modifier for multilingual field with defined suffix name

* Improved document modifier for multilingual field with defined suffix name

* refact: reverted access modifier changes

* Lowercase label in documentModifierI18nSort in case old solr schema is used which doesn't have lowercase filter

* fix: fixed queries and locale names

* fix: renamed new document modifier

* fix: use linkedHashMap to retain map sort fields order

* refact: extracted buildAndExecuteVClassQuery(List<String> classUris, int page, int pageSize, String alpha, VitroRequest vreq)

* fix: removed unused import

* fix: constant name aligned with other suffix

Co-authored-by: Brian Lowe <brian@ontocale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants