Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing no-break spaces when converting HTML to plain text upon clipboard export #173

Open
hsivonen opened this issue Apr 1, 2022 · 4 comments

Comments

@hsivonen
Copy link
Member

hsivonen commented Apr 1, 2022

Gecko bug for context.

It's unclear to me if the operation of generating a plain-text representation of HTML copied to the clipboard is within the scope of this (or any) spec, but in case it is:

It appears that:

  • Gecko, WebKit, and Blink retain no-break space characters when exporting HTML as HTML onto the clipboard.
  • Gecko, WebKit, and Blink replace no-break space characters with regular spaces when exporting HTML converted to plain text onto the clipboard.
  • When exporting plain text as plain text onto the clipboard, WebKit and Blink don't replace no-break spaces with regular spaces, but Gecko does.

The shortest path to have all three do the same thing would be for Gecko to change only not to replace no-break spaces when exporting plain text as plain text.

However, it's bad to replace no-break spaces with regular spaces in HTML to plain text conversion in cases where the no-break spaces are used for a legitimate purpose (e.g. in combination with French quotation marks). At least in Gecko, the replacement of no-break spaces with regular spaces is motivated by undoing the contentEditable behavior of generating making every other space bar press insert a no-break space to counteract CSS's space collapsing behavior and producing visible spaces on every space bar press in contentEditable.

Questions:

  • Would it be bad in practice (as opposed to principle of deviating from the current interop state) to replace no-break space with regular spaces in HTML to plain text conversion only when the contentEditable-like pattern of alternating spaces and no-break spaces is detected?
  • Is this issue in the scope of this spec or another spec?
@mbrodesser
Copy link

At least sanitizing text in clipboard.readText() was identified as an issue, see "Issue 5" in step 1.3 of https://w3c.github.io/clipboard-apis/#dom-clipboard-readtext, so that's in scope of this spec.

@annevk annevk added the Agenda+ label May 5, 2022
@css-meeting-bot
Copy link
Member

The Web Editing Working Group just discussed Henri's issue.

The full IRC log of that discussion <Travis> Topic: Henri's issue
<Travis> github: https://github.com//issues/173
<Travis> henri: w/contenteditable, user expects that space bar produces a visible space.
<Travis> .. originally, there was no CSS for whitespace: pre ?
<Travis> .. (because of whitespace collapsing)
<Travis> .. clipboard adds alternating spacing + non-breaking spaces
<Travis> .. when browser maps all nbsp to regular space.
<Travis> .. (number)nbsp(unit) these can sometimes be replaces.
<Travis> .. conclusion: impossible to copy from the web that retains nbsp's.
<Travis> .. contemplating change in Gecko to...
<Travis> .. when nbsp isn't adjacent to a regular space (both ends touch something other than a space--since those aren't created by editor as a hack), then LEAVE THEM BE.
<Travis> .. This is an area that is not really part of web interop concerns...
<Travis> .. compat/interop concern is from copy-then-paste all within the web platform.
<Travis> .. Q to other vendors: any concerns with this plan?
<Travis> .. can you see interop problems with this?
<Travis> .. (except a divergence between the three engines doing the same thing in this case)
<Travis> BoCupp: Not sure I follow what all the browsers are doing...
<Travis> .. Do all browsers just put the copy of the spaces when copying...
<Travis> henri: when copying from plaintext (no HTML involved); blink preserves nbsp,
<Travis> .. Gecko does not.
<Travis> .. when pasting into plaintext, all engines currently replace the nbsp. Gecko wants to diverge from this.
<Travis> BoCupp: which scenario are we optimizing for?
<Travis> henri: when HTML contains a nbsp for legitimate typographical reasons (keep units together with number, french quotes, etc.)
<Travis> .. anything except faking the collapsing of space by the editor.
<Travis> .. hypothesis: all other cases are legitimate uses of nbsp and should be preserved when exporting to clipboard.
<Travis> .. so we don't want to mess with those.
<Travis> johanneswilm: do you think you can detect all the case when the editor does the fixup?
<Travis> henri: if there is a sequence of nbsp has either an ascii space before/after, then we would consider that editor-generated.
<Travis> .. everything else would be considered a legitimate.
<Travis> .. I haven't done the research to see if editors expect that behavior... my experience is that existing web logic expects the current editing behavior.
<Travis> whsieh: q: idea is to preserve nbsp in dataTransfer.data or paste to plaintext and readback?
<Travis> henri: idea is to preserve nbsp when exporting to native clipboard flavor; the rest of the behavior would flow from that.
<Travis> .. if an app paste to plaintext, then it would be affected.
<Travis> .. a little handwavy to understand the other subtle places where this might impact.
<Travis> whsieh: I wonder if there would be compat with apps (external to browser).
<Travis> henri: the case with a textarea in webkit, with no spaces, then the copy inserts the nbsp places... if that breaks apps then it would be an existing concern.
<Travis> whsieh: it would be broadening the concern if it was there.
<Travis> Travis: sounds like the consensus of the group is to "give it a try" and report back?
<Travis> .. didn't hear any objections (just questions)?
<Travis> johanneswilm: is this something we want to put in the spec? Or are we just OK with the interop divergence.
<Travis> henri: I'm not asking for inclusion in a spec (this is borderline not part of standards).
<Travis> BoCupp: I like the suggestion (it makes sense to me). When you do it and have success, I think it would be great if we could write it down.
<Travis> .. contenteditable spec has a section we could put this into...
<whsieh> q+
<Travis> johanneswilm: execcommand?
<Travis> BoCupp: ..looking for a link.
<Travis> johanneswilm: For now, henri should try it out and report back on the issue.
<Travis> .. like seeing the algorithm for determining the space handling that henri is going to try.
<Travis> .. other browsers may want to then try it out.
<Travis> BoCupp: can you comment on which issue (mentioned in the github issue) ...
<Travis> henri: I think the scenario is copying from the web, then pasting into plaintext textarea--demonstrating how it's relevant to web interop.
<whsieh> q-
<Travis> BoCupp: These would be changes to the serialization to the clipboard?
<Travis> henri: not sure. I was thinking of the action when a range in the DOM is exported to clipboard (HTML) on copy.
<Travis> .. to the extent there are ways to trigger the export (other than users pressing Ctrl+C), would assume they would got through the same code path. If not, then that's an additional complication.
<Travis> BoCupp: Suspect that they don't go through the same codepath.
<Travis> .. some cases you walk the DOM, in other cases, you're just given some text to insert.
<Travis> henri: Okay.
<Travis> BoCupp: Like the idea of you experimenting with it!
<Travis> johanneswilm: and if it DOESN'T work, we'd appreciate knowing!

@mbrodesser
Copy link

mbrodesser commented May 13, 2022

For the record: Chrome (at least on Ubuntu 20.04) forbids copying when a contenteditable element is selected: data:text/html,A<div contenteditable>X</div>.

CC @masayuki-nakano

@hsivonen
Copy link
Member Author

hsivonen commented Feb 7, 2023

The shortest path to have all three do the same thing would be for Gecko to change only not to replace no-break spaces when exporting plain text as plain text.

This was OK.

Would it be bad in practice (as opposed to principle of deviating from the current interop state) to replace no-break space with regular spaces in HTML to plain text conversion only when the contentEditable-like pattern of alternating spaces and no-break spaces is detected?

This turned out not to be Web-compatible due there being sites that, instead of using the pre element or a relevant CSS property, replaced spaces in code examples with no-break spaces and relied on browser reversing the replacement upon copy to clipboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants