Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using highlighter and serializer together #438

Open
pburrows opened this issue Dec 8, 2017 · 11 comments
Open

Using highlighter and serializer together #438

pburrows opened this issue Dec 8, 2017 · 11 comments

Comments

@pburrows
Copy link

pburrows commented Dec 8, 2017

I am creating an app to let users annotate a document. I'd like to save those annotations and let a user reload them at a later date.

I am using Highlighter Module to let users make their annotations.
I am using Serializer Module to save the selections and re-apply them at a future date.

Highlighter module mutates the original document (adds <span> tags) which causes deserializing serialized selections to fail.

Have other people solved this issue? Is there a technique to fix this? Maybe somehow updating a serialized selection based on the newly mutated document?

@colinricardo
Copy link

@pburrows Did you have any luck with this?

@srsudar
Copy link

srsudar commented Nov 24, 2019

I'm also running into this problem. Did anybody come up with a solution?

The highlighter module has a serialize() function that supposedly supports multiple highlights. This makes me think there might be some native support for this in rangy somehow, but I haven't played with it enough to be confident that it actually does what I think it does.

@srsudar
Copy link

srsudar commented Nov 25, 2019

I may have found a workaround that suits my purposes. In my initial tests it seems to be working, but I haven't tried it extensively yet.

Every time that I save a highlight I create a new highlighter object and call highlightSelection('highlight'). I immediately call serialize() on that highlighter, which returns something like: "type:textContent|1463$1475$1$highlight$". When I persist that highlight to my back end, I strip type:textContent|, leaving only 1463$1475$1$highlight$".

When I retrieve a batch of highlights, I sort by the leading number, which I think is what serialize() does. I then prepend type:textContent and join with | to recover something like this, matching rangy's own serialize() method: "type:textContent|1463$1475$1$highlight$|1529$1537$2$highlight$".

I can then pass this to the deserialize() method of highlighter to restore highlights. Definitely a bit jankier than I would like, but I think this is basically tricking rangy into treating highlights individually. I haven't tried on complicated DOMs yet, however, so I'm not sure this is flawless.

@pburrows
Copy link
Author

I wound up using findText on the range object for the selected text, and storing the findIndex along with the serialization (so, if the selected text is "hello" and there are 15 instances of "hello" in the document and the user highlights the 13th, the code will store that it is the 13th instance of the text. It worked pretty reliably.

Here is my serialization / deserialization of selection code. I don't think there is anything in this code that is specific to my app (a brief glance doesn't look like it). It is in typescript cause this is an angular app:

    export function deserializeSelection(selection: string, requestBody: ViewContainerRef): void {
        const baseSelection: any = rangy.getSelection(requestBody.element.nativeElement);
        const selItem: SelectionItem = JSON.parse(selection);
        baseSelection.removeAllRanges();
        const parentRange: any = rangy.createRange();
        parentRange.selectNodeContents(requestBody.element.nativeElement);

        const findRange: any = rangy.createRange();
        const findOptions: any = {
            withinRange: parentRange
        };
        let findCount: number = 0;
        while (findRange.findText(selItem.Text, findOptions)) {
            if (findCount === selItem.FindIndex) {
                //todo -- do something with the range;
                baseSelection.setSingleRange(findRange);
                return findRange;
            }
            findRange.collapse(false);
            findCount++;
        }
        return null;
    }

    export function serializeCurrentSelection(requestBody: ViewContainerRef): string {
        const sel: any = rangy.getSelection();
        let selectedText: string = sel.toString();
        const parentRange: any = rangy.createRange();

        selectedText = selectedText.replace(/\xA0/g, ' ');
        parentRange.selectNodeContents(requestBody.element.nativeElement);
        const selToSerialzie: SelectionItem = {
            Text: selectedText,
            FindIndex: -1
        };

        const findRange: any = rangy.createRange();
        const findOptions: any = {
            withinRange: parentRange
        };
        let findCount: number = 0;
        while (findRange.findText(selectedText, findOptions)) {
            const intersects: any = findRange.intersection(sel._ranges[0]);
            if (intersects && intersects !== null) {
                selToSerialzie.FindIndex = findCount;
                break;
            }
            findRange.collapse(false);
            findCount++;
        }
        return JSON.stringify(selToSerialzie);
    }

@srsudar
Copy link

srsudar commented Dec 2, 2019

Fantastic, this so far has worked great for me! Almost immediately after my last post I realized that my approach didn't work in many cases anyways, so this is even better. The one thing I changed from your solution was the ViewContainerRef type. Instead I just did parentRange.selectNodeContents(document.documentElement);.

Thank you very much for sharing. I hadn't seen that part of the rangy API, and it would have taken me quite a while to get such a clean solution.

@srsudar
Copy link

srsudar commented Apr 10, 2020

This ended up being a good starting point, but I couldn't get it working entirely reliably. The problem was that rangy.getSelection().toString() winds up with a different number of newlines (and also possibly spaces?) than are found by findText().

For example, imagine selecting from FOO to BAR in the DOM below:

<p>lorem FOO</p>
<p>BAR ispum</p>

In this case, (with FOO BAR selected), rangy.getSelection().toString() gave me FOO\n\nBAR. However, when using findText(), for some reason (maybe due to COLLAPSIBLE_SPACE?), the text that is searched is FOO\nBAR, which doesn't match.

This made it so that highlights could neither be serialized nor restored for me. And for some reason, searching at any point with findIndex == -1 seemed to break future searches, and they would hang. I tried various wordOptions and characterOptions settings, but I couldn't make any work.

Maybe there's a better solution for this, but I have a workaround in place that seems to be OK for the (admittedly few) cases I've tried thus far. The idea is to:

  1. Get the selection
    • 'FOO\n\nBAR and (paren)'
  2. Escape this selection so it can be used in regex (using require('escape-string-regexp'))
    • 'FOO\n\nBAR and \\(paren\\)'
  3. Replace all whitespace with a single whitespace regex matcher
    • const regExpStr = escaped.replace(/\s+/g, '\\s+') (note we're escaping the replacement)
  4. Use this new string to create a RegExp
    • re = new RegExp(regExpStr)
  5. Call findText() with this RegExp rather than the string

So far this seems to be working to cross paragraph boundaries. I'd be curious to know if you've run into this problem and if so how you solved it.

@pburrows
Copy link
Author

Since this library is pretty dead, I wound up cloning the project here:
https://github.com/pburrows/rangy-updated
and publishing it in npm as rangy-updated.

This clone accepts PRs. I might have solved the problem here:
pburrows/rangy-updated@854b8ad#diff-11d4f3bc2b39c646049f75845b5a8ae4

But looking at it, maybe not? I think my change was for   specifically. I don't think I've had your specific problem (and looking through the unit tests, I see that there is coverage for your specific scenario).

@timdown
Copy link
Owner

timdown commented Apr 11, 2020 via email

@srsudar
Copy link

srsudar commented Apr 11, 2020

Awesome! I'll be curious to see what you find. I don't understand the vagaries of whitespace in the browser nearly well enough to understand that part of the code. Or to make sense of pburrows's non-breaking space changes.

I also need to work only in Chrome, unlike rangy.

FWIW, here is my solution, modified from pburrows.

  /**
   * This is based on pburrows's solution, given here:
   *
   * https://github.com/timdown/rangy/issues/438
   */
  createHighlightedTextFromSelection(rangy:any): models.HighlightedText {
    rangy.init();
    const sel: any = rangy.getSelection();

    // Use the browser's selection rather than rangy's to capture the text. The
    // browser's selection looks closer to what the user sees. See this link for
    // an explanation of how rangy's selection isn't as realistic:
    //
    // https://github.com/timdown/rangy/wiki/Rangy-Selection#tostring
    let selectedText: string = document.getSelection().toString();

    const parentRange: any = rangy.createRange();

    const re = this.convertTextToRegExp(selectedText);

    // Not sure what he does here--something about nativeElement instead.
    parentRange.selectNodeContents(document.documentElement);

    let findIndex: number = -1;

    const findRange: any = rangy.createRange();
    const findOptions: any = {
      withinRange: parentRange,
    };
    let findCount: number = 0;
    while (findRange.findText(re, findOptions)) {
      const intersects: any = findRange.intersection(sel._ranges[0]);
      if (intersects && intersects !== null) {
        findIndex = findCount;
        break;
      }
      findRange.collapse(false);
      findCount++;
    }

    const result = new models.HighlightedText(selectedText);
    result.findIndex = findIndex;

    return result;
  }

  /**
   * This is based on pburrows's solution, given here:
   *
   * https://github.com/timdown/rangy/issues/438
   */
  restoreHighlight(rangy: any, hl: models.HighlightedText): boolean {
    rangy.init();
    const baseSelection: any = rangy.getSelection(document.documentElement);
    baseSelection.removeAllRanges();
    const parentRange: any = rangy.createRange();
    parentRange.selectNodeContents(document.documentElement);

    const findRange: any = rangy.createRange();
    const findOptions: any = {
      withinRange: parentRange,
    };

    const re: RegExp = this.convertTextToRegExp(hl.text);

    // Only perform our restorations if we actually find a match. Otherwise
    // things break and hang.
    let foundMatch: boolean = false;
    let findCount: number = 0;
    while (findRange.findText(re, findOptions)) {
      if (findCount === hl.findIndex) {
        //todo -- do something with the range;
        baseSelection.setSingleRange(findRange);
        foundMatch = true;
        break;
      }
      findRange.collapse(false);
      findCount++;
    }

    if (!foundMatch) {
      return false;
    }

    const restored = this.highlightCurrentSelection(
      this.externalHighlighter, null);
    // Now that we've highlighted, clear the selection.
    baseSelection.removeAllRanges();
    return restored && restored.text === hl.text;
  }

  convertTextToRegExp(text: string): RegExp {
    // We have various issues with how Rangy handles whitespace. We call pure
    // selection.toString(), and that doesn't perfectly match the character
    // iterator Rangy uses to look for matches. This leads to an issue
    // where `toString()` returns `foo\n\nbar`, but the rangy `findText()`
    // function is comparing to `foo\nbar`, which doesn't match.
    //
    // Instead of straight string matching using `indexOf()`, we instead are
    // going to have rangy look using a `RegExp`, where we basically ignore
    // whitespace. See a bit more discussion of my approach here:
    //
    // https://github.com/timdown/rangy/issues/438#issuecomment-612123194
    //
    // We want rangy to match essentially treating all whitespace as similar. To
    // do this, we are going to convert all whitespace in the string to a
    // whitespace+ character (note that we have to esacpe it), and then turn
    // that into a regexp. This seems potentially dangerous. But let's try it.

    // First safely escape things like parens.
    let regexpStr: string = escapeStringRegex(text);
    // And now replace all whitespace with a single whitespace.
    regexpStr = regexpStr.replace(/\s+/g, '\\s+');
    const re = new RegExp(regexpStr);
    return re;
  }

@linhttbk
Copy link

@srsudar can you show me the code of highlightCurrentSelection function. What happens when the selected text was duplicated inside documentElement. Thank you!

@srsudar
Copy link

srsudar commented Apr 27, 2020

Not sure I follow the duplicated text question. I put that method in a gist, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants