first pass at browse offset and match highlighting #255

hackartisan · 2014-12-05T20:52:21Z

Note there are TODOs (most notably the config file piece).

Seeking input on:

how to deal with the method signature of alphabeticBrowse in VuFindSearch/Backend/Solr/Backend
css styling (currently a blue box but if nothing else the color should change)
is there a better way to figure out whether the result solr returns is an exact match? I'm actually fine with the rough comparison (uppercase, strip spaces) but if there's any easy and more precise way to tell I'd use that.

demiankatz · 2014-12-08T18:39:15Z

Just a few comments for now:

1.) I'll second Tod's comment (made in an email off the PR) that the $offset variable/parameter in the PHP should probably be renamed to avoid confusion with the offset parameter of the browse handler. $rows_before works for me if you have no better name in mind.

2.) It probably makes sense to hold off on merging this until we can include the highlight matching inside the browse handler. I'm definitely not the best person to tackle that, but if nobody else has time, I might be able to look at it eventually -- would be a good excuse to learn the code a little better.

3.) I notice that you're implementing this in the old bootstrap theme. I'm about to delete that from master (either today or tomorrow). I'll most likely hold off on doing that until I hear more from you, but it would be helpful if you could revise the PR to use bootstrap3 or blueprint instead.

hackartisan · 2014-12-08T18:50:19Z

re 1) no problem
re 2) I'm hoping Mark can make quick work of it but if needed I may be able to take a look.
re 3) Yeah we still have a way to go before we're off bootstrap and on bootstrap3. I can revise the PR, though.

demiankatz · 2014-12-08T19:09:42Z

re 1) great! Please leave a comment when a revised version is ready for review.
re 2) Sounds good; we'll see what Mark has to say.
re 3) The change would be helpful to ensure that the PR remains mergeable; let us know if we can do anything more to help ease your transition to bootstrap3 -- I don't want to pull the rug out from under you, but I do hope to complete the deprecation of that theme soon, since the lack of bootstrap templates is holding up a bunch of other PR's, and they can all move forward if we take the bootstrap theme off the table.

hackartisan · 2014-12-08T19:18:45Z

Demian, don't let us hold you up on the theme deprecation. We'll catch up before too long and I'll probably deploy this stuff locally before it's accepted to master, anyway. You've given ample warning and we'll definitely reach out if we need to.

demiankatz · 2014-12-08T19:20:56Z

Excellent, thanks!

marktriggs · 2014-12-08T19:56:08Z

On 2), I'll give the highlighting a shot and put up a PR for that. So in the results I'll try to indicate:

The row in the results that was the match
Whether it was an exact match or, otherwise, how many characters of the search string were matched

hackartisan · 2014-12-08T20:07:30Z

Thanks, Mark! I think it would be easier to use a simple bool for exact head-of-string match. Because the number of characters might differ between the normalized and non-normalized form of the search.

marktriggs · 2014-12-08T20:22:50Z

Ah yep, that makes sense. And thinking out loud on how that would work, does this sort of approach make sense?

 user_input = <what the user typed>
 matched_heading = <first browse heading matched>

 user_input_key = generate_sort_key(user_input)

 if (generate_sort_key(matched_heading) == user_input_key) {
   # exact match on user input
 else {
   # was there a head of string match?
   for (i = 1; i < matched_heading.length; i++) {
     head_of_string = substring(matched_heading, 0, i)
     if (generate_sort_key(head_of_string) == user_input_key) {
        # Got a head of string match
        return true
     }
   }
 }

So I guess my thought is to generate successively larger substrings of the matched header, and checking whether we get a match against the user's input (using the normalized sort key for the comparison). Does that seem like it would work?

demiankatz · 2014-12-08T20:33:13Z

That approach makes sense to me. One other question for the crowd: obviously, there will be scenarios where the user input matches the head of multiple strings. How do we want to handle that?

hackartisan · 2014-12-08T20:33:50Z

Oh! I think our interpretations of 'exact head-of-string match' are slightly different. I was thinking of something like the following example:

user input: PS153
matched heading: PS153.A73H37 2011
exact head-of-string match: true

So more like this

user_input_key = generate_sort_key(<what the user typed>)
matched_heading_key = generate_sort_key(<first browse heading matched>)

if (matched_heading_key == user_input_key) {
   # exact match on user input
else if user_input_key is a left-anchored substring of matched_heading_key {
   # head of string match
 }

hackartisan · 2014-12-08T20:34:50Z

Demian, I think right now we're only concerned with the status of the match on the 'first match' row (that is, the first row returned once the offset is compensated for.

hackartisan · 2014-12-08T20:36:39Z

To further my example, I'm not interested at all in whether some substring of the user input is a head-of-string match for the heading

user input: PS151
matched heading: PS153.A73H37 2011
exact head-of-string match: false (i.e. the fact that the first 4 characters match is useless to me)

demiankatz · 2014-12-08T20:39:39Z

I think Mark's example is actually matching your interpretation of what should be happening. The reason it's going on in a loop is because we have to test every possible substring of the matched heading against the user input heading, since we don't know if there's a 1:1 relationship between input string length and key string length.

hackartisan · 2014-12-08T20:42:01Z

Oh okay! cool! I didn't realize there wasn't a substring test in Java.

demiankatz · 2014-12-08T20:47:23Z

It's not that there isn't a substring test in Java; it's just that the relationship between sort keys and headings is flexible enough that you can't make any assumptions about what keys look like. Suppose, for (contrived) example, that a key is generated as an integer value calculated from the text of the string. You can only test for substring matches by generating keys for all possible substrings and comparing them. You can't do substring checking against the keys themselves, because they don't work the same way as "regular strings." (Not that any of this is particularly important... just trying to shed light on the subject).

hackartisan · 2014-12-08T20:49:40Z

I see, yes, because the normalizers return byte arrays. Thanks for expanding on the explanation. Exactly the sort of thing that makes it much faster for Mark to implement than for me to do it!

marktriggs · 2014-12-08T21:06:09Z

Cool! Thanks all. I'll see how I go :)

marktriggs · 2014-12-11T03:08:39Z

Hi all,

I've made a start on this. While testing I noticed an interesting case that I thought I'd get the group's feedback on.

I've been testing on title browse, which builds sort keys based on title_sort and displays headings from the title_fullStr field. One of the documents in my test data looks like this:

title_fullStr: The nature of love
title_sort: nature of love

So, the biblio indexing process has dropped the word "the" to produce title_sort. When the indexer runs, it turns title_sort into a normalized byte array (for the sort key), and stores "The nature of love" in the SQLite index file.

This makes the behaviour at search time a little confusing. If I browse for "the nature of love", I actually don't get the above match, because my query contains the word "the" while the sort key does not. I have to browse for "nature of love" to actually get this match. I guess this has always been the case, and maybe it isn't a big problem.

But, when I browse for "nature of love", my new code tries to work out whether this is an exact or start-of-string match against the display heading, not the sort heading. So it looks for a common prefix between "the nature of love" (the display heading) and "nature of love" (my query), and finds none.

So I'm a bit unsure what to do for these cases. I could modify the browse indexes to store the original sort key text and use that for my prefix checking, but if the sort key is quite different to the display heading, knowing that there was a prefix match on the sort key might not help you much in highlighting the match on the display heading. That still seems better than nothing, but what do others think?

todolson · 2014-12-11T04:32:03Z

Compare to the sort key, not the display. The sort key is what controls the ordering and where the search term lands in the index, and includes all normalization. Normalize the search term the way you would at indexing time, no need to second-guess.

If a site wants to allow matches titles both with and without an initial article, we can arrange it by adding both versions to the browse index. That’s how the UChicago side is configured, so it can be done, and with little fuss. So definitely take the term as-is, normalize as usual, and work from that.

-Tod

On Dec 10, 2014, at 9:08 PM, Mark Triggs <notifications@github.com mailto:notifications@github.com> wrote:

Hi all,

I've made a start on this. While testing I noticed an interesting case that I thought I'd get the group's feedback on.

I've been testing on title browse, which builds sort keys based on title_sort and displays headings from the title_fullStr field. One of the documents in my test data looks like this:

title_fullStr: The nature of love
title_sort: nature of love

So, the biblio indexing process has dropped the word "the" to produce title_sort. When the indexer runs, it turns title_sort into a normalized byte array (for the sort key), and stores "The nature of love" in the SQLite index file.

This makes the behaviour at search time a little confusing. If I browse for "the nature of love", I actually don't get the above match, because my query contains the word "the" while the sort key does not. I have to browse for "nature of love" to actually get this match. I guess this has always been the case, and maybe it isn't a big problem.

But, when I browse for "nature of love", my new code tries to work out whether this is an exact or start-of-string match against the display heading, not the sort heading. So it looks for a common prefix between "the nature of love" (the display heading) and "nature of love" (my query), and finds none.

So I'm a bit unsure what to do for these cases. I could modify the browse indexes to store the original sort key text and use that for my prefix checking, but if the sort key is quite different to the display heading, knowing that there was a prefix match on the sort key might not help you much in highlighting the match on the display heading. That still seems better than nothing, but what do others think?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/255#issuecomment-66562950.

marktriggs · 2014-12-11T05:31:43Z

Thanks Tod, that matches where I was heading too. I've put up an initial version showing the changes required to make this work:

vufind-org/vufind-browse-handler@203f125

The changes affect the indexing process too, so people would need to rebuild their browse indexes for this new version.

demiankatz · 2014-12-11T17:28:13Z

Looks good to me (at first glance, anyway -- I don't claim to have performed a comprehensive code review). I can certainly live with the need to rebuild the indexes -- just something to note in the changelog.

todolson · 2014-12-12T21:23:13Z

I’ve only managed a superficial look, which seems fine so far. I’ve not tried to merge it with my other changes.

On Dec 11, 2014, at 11:28 AM, Demian Katz <notifications@github.com mailto:notifications@github.com> wrote:

Looks good to me (at first glance, anyway -- I don't claim to have performed a comprehensive code review). I can certainly live with the need to rebuild the indexes -- just something to note in the changelog.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/255#issuecomment-66655617.

hackartisan · 2014-12-17T14:44:26Z

Mark, just wanted to say Thank You for this - I read it and I'm working on integrating it but I'm not sure whether I'll have any feedback before the holiday (next week and the week after). Will keep you posted.

hackartisan · 2014-12-17T21:10:44Z

Hey Demian, I migrated the theme changes to bootstrap3. The highlight box isn't working when it tries to go around the very first entry (blank search); not sure why. But I'm going to move on to testing Mark's browse handler changes and de-hack-ifying my highlight logic. Leaving bootstrap files as-is; I assume you'll just skip them.

marktriggs · 2014-12-17T21:36:37Z

Thanks Anna! No rush from me. I'll be pretty busy over the next few weeks too, so maybe we can pick up again once you've had a chance to try it out.

hackartisan · 2015-01-23T14:46:53Z

Hey Demian! Mark and I are done going back and forth over the highlighting support and we're ready for that branch (https://github.com/vufind-org/vufind-browse-handler/tree/highlighting-support) to be merged into master and pulled into vufind. Once that's done I will pull the changes and I can then push my current code (which uses his updated API) to this PR.

demiankatz · 2015-01-23T15:11:30Z

Thanks for the update. I believe I've merged everything in now -- let me know if you run into any problems.

hackartisan · 2015-01-23T15:55:53Z

The merge caused conflicts and I thought it would be easier to squash it all into one commit on top of current master. I've created a new branch and I'll submit a PR from there. Hope that works for you; I'm not really sure there is an ideal way to deal with this situation (when time has elapsed and a lot has happened on master).

demiankatz · 2015-01-23T16:09:26Z

That's totally fine -- I'll just mention #289 here in case anyone wants to follow the thread to the replacement PR.

first pass at browse offset and match highlighting

4467b3d

migrated theme changes to bootstrap3

ad5b95e

fix top highlight for row 0 in browse

6cf3005

hackartisan closed this Jan 23, 2015

hackartisan mentioned this pull request Jan 23, 2015

Add configurable alphabrowse row highlighting. #289

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first pass at browse offset and match highlighting #255

first pass at browse offset and match highlighting #255

hackartisan commented Dec 5, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

demiankatz commented Dec 8, 2014

marktriggs commented Dec 8, 2014

hackartisan commented Dec 8, 2014

marktriggs commented Dec 8, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

hackartisan commented Dec 8, 2014

hackartisan commented Dec 8, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

marktriggs commented Dec 8, 2014

marktriggs commented Dec 11, 2014

todolson commented Dec 11, 2014

marktriggs commented Dec 11, 2014

demiankatz commented Dec 11, 2014

todolson commented Dec 12, 2014

hackartisan commented Dec 17, 2014

hackartisan commented Dec 17, 2014

marktriggs commented Dec 17, 2014

hackartisan commented Jan 23, 2015

demiankatz commented Jan 23, 2015

hackartisan commented Jan 23, 2015

demiankatz commented Jan 23, 2015

first pass at browse offset and match highlighting #255

first pass at browse offset and match highlighting #255

Conversation

hackartisan commented Dec 5, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

demiankatz commented Dec 8, 2014

marktriggs commented Dec 8, 2014

hackartisan commented Dec 8, 2014

marktriggs commented Dec 8, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

hackartisan commented Dec 8, 2014

hackartisan commented Dec 8, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

demiankatz commented Dec 8, 2014

hackartisan commented Dec 8, 2014

marktriggs commented Dec 8, 2014

marktriggs commented Dec 11, 2014

todolson commented Dec 11, 2014

marktriggs commented Dec 11, 2014

demiankatz commented Dec 11, 2014

todolson commented Dec 12, 2014

hackartisan commented Dec 17, 2014

hackartisan commented Dec 17, 2014

marktriggs commented Dec 17, 2014

hackartisan commented Jan 23, 2015

demiankatz commented Jan 23, 2015

hackartisan commented Jan 23, 2015

demiankatz commented Jan 23, 2015