Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text: add checkbox to control server-side markdown conversion #5378

Merged
merged 6 commits into from
Oct 15, 2021

Conversation

nfelt
Copy link
Collaborator

@nfelt nfelt commented Oct 14, 2021

Workaround for #5369. Fixes #830.

This adds a simple [√] Enable Markdown checkbox to the Polymer text plugin, which if deselected (it's selected by default to preserve existing behavior) will skip the markdown conversion step on the server. To do this, we introduce a markdown query parameter to the /data/plugin/text/text endpoint, and add a plugin_util.safe_html() function that does only the sanitization step of markdown_to_safe_html.

I've made the checkbox persist to localstorage (like a few other Polymer properties) since I expect most people will either want it always on or always off (and for working around the slow rendering, having to re-disable it on load each time would make this basically worthless as a workaround).

There are some limitations to this approach (it's Polymer-only; client-side markdown rendering would still be nicer). Ultimately, it might make more sense to add a summary API option to specify at write time whether the data should be treated as markdown or not (as suggested in #832 (comment)). But that requires more in-depth thought since it's persisted forever, and wouldn't help users on older versions of TensorFlow or users who might want to occasionally disable the Markdown rendering for debugging, etc. So even if we do add that option, I think we might still want to keep a UI control, and have it just become a try-state of auto (use summary metadata) in addition to true and false.

Test plan: tested with text demo data. Confirmed I can toggle it on and off and it appropriately re-fetches and re-renders the text summary data, and that local storage persistence behaves as expected. Googlers, see cl/402948742 for internal test sync and test case.

Screenshots

Enabled
image

Disabled
image

Copy link
Contributor

@stephanwlee stephanwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another compromise we are making is that we cannot turn on/off markdown for a specific card since this is more of a global setting.

Comment on lines +192 to +195
decode = lambda bs: bs.decode("utf-8") if isinstance(bs, bytes) else bs
text_arr_str = np.array(
[decode(bs) for bs in text_arr.reshape(-1)]
).reshape(text_arr.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do note that the markdown version makes sure it accounts for null characters resulting from decoding mismatch.

In tensorboard/plugin_util.py.

            source_decoded = source.decode("utf-8")
            # Remove null bytes and warn if there were any, since it probably means
            # we were given a bad encoding.
            source = source_decoded.replace("\x00", "")
            total_null_bytes += len(source_decoded) - len(source)
...

    warning = ""
    if total_null_bytes:
        warning = (
            "<!-- WARNING: discarded %d null bytes in markdown string "
            "after UTF-8 decoding -->\n"
        ) % total_null_bytes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I considered adding that to safe_html but wasn't sure it was warranted since the sanitizer strips the null bytes anyway (confirmed with a unit test). The reason given in the markdown version for doing the stripping is that otherwise, the markdown conversion gets confused by the null bytes:

def test_null_bytes_stripped_before_markdown_processing(self):
# If this function is mistakenly called with UTF-16 or UTF-32 encoded text,
# there will probably be a bunch of null bytes. These would be stripped by
# the sanitizer no matter what, but make sure we remove them before markdown
# interpretation to avoid affecting output (e.g. middle-word underscores
# would generate erroneous <em> tags like "un<em>der</em>score") and add an
# HTML comment with a warning.

So the only difference really is whether we insert the diagnostic HTML comment or not. That didn't quite seem worth any potential overhead to me, but I'm happy to add it back if you prefer?

@nfelt
Copy link
Collaborator Author

nfelt commented Oct 15, 2021

PTAL re: null byte stripping

Another compromise we are making is that we cannot turn on/off markdown for a specific card since this is more of a global setting.

Yeah, this is a fair point, but I guess it seemed acceptable to me for now - I suspect most users will probably log either plain or markdown consistently depending on use case, and for those users, having to toggle each card separately would be a pain. For users who really do log several different varieties of text summary, it seems to me that they'd be best served anyway by introducing a write-time parameter to summary-metadata, since that way they don't have to go through and manually toggle the individual tags to the desired state.

Copy link
Contributor

@stephanwlee stephanwlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely missed the null byte test. Thanks for pointing it out.

@nfelt nfelt merged commit e34e9e4 into tensorflow:master Oct 15, 2021
@nfelt nfelt deleted the text-markdown-optional branch October 15, 2021 18:41
yatbear pushed a commit to yatbear/tensorboard that referenced this pull request Mar 27, 2023
…flow#5378)

* define plugin_util.safe_html() with no markdown interpretation

* make plugin_util.safe_html() handle unicode vs bytes clearly

* add markdown=false request parameter to disable markdown interpretation

* fix bytes to str conversion bug in text plugin no-markdown codepath

* text: add checkbox to control server-side markdown conversion

* yarn fix-lint
dna2github pushed a commit to dna2fork/tensorboard that referenced this pull request May 1, 2023
…flow#5378)

* define plugin_util.safe_html() with no markdown interpretation

* make plugin_util.safe_html() handle unicode vs bytes clearly

* add markdown=false request parameter to disable markdown interpretation

* fix bytes to str conversion bug in text plugin no-markdown codepath

* text: add checkbox to control server-side markdown conversion

* yarn fix-lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature request: option to show raw text instead of Markdown
2 participants