text: add checkbox to control server-side markdown conversion #5378

nfelt · 2021-10-14T00:22:32Z

Workaround for #5369. Fixes #830.

This adds a simple [√] Enable Markdown checkbox to the Polymer text plugin, which if deselected (it's selected by default to preserve existing behavior) will skip the markdown conversion step on the server. To do this, we introduce a markdown query parameter to the /data/plugin/text/text endpoint, and add a plugin_util.safe_html() function that does only the sanitization step of markdown_to_safe_html.

I've made the checkbox persist to localstorage (like a few other Polymer properties) since I expect most people will either want it always on or always off (and for working around the slow rendering, having to re-disable it on load each time would make this basically worthless as a workaround).

There are some limitations to this approach (it's Polymer-only; client-side markdown rendering would still be nicer). Ultimately, it might make more sense to add a summary API option to specify at write time whether the data should be treated as markdown or not (as suggested in #832 (comment)). But that requires more in-depth thought since it's persisted forever, and wouldn't help users on older versions of TensorFlow or users who might want to occasionally disable the Markdown rendering for debugging, etc. So even if we do add that option, I think we might still want to keep a UI control, and have it just become a try-state of auto (use summary metadata) in addition to true and false.

Test plan: tested with text demo data. Confirmed I can toggle it on and off and it appropriately re-fetches and re-renders the text summary data, and that local storage persistence behaves as expected. Googlers, see cl/402948742 for internal test sync and test case.

Screenshots

Enabled

Disabled

stephanwlee

Another compromise we are making is that we cannot turn on/off markdown for a specific card since this is more of a global setting.

stephanwlee · 2021-10-15T01:25:08Z

tensorboard/plugins/text/text_plugin.py

+        decode = lambda bs: bs.decode("utf-8") if isinstance(bs, bytes) else bs
+        text_arr_str = np.array(
+            [decode(bs) for bs in text_arr.reshape(-1)]
+        ).reshape(text_arr.shape)


Do note that the markdown version makes sure it accounts for null characters resulting from decoding mismatch.

In tensorboard/plugin_util.py.

source_decoded = source.decode("utf-8") # Remove null bytes and warn if there were any, since it probably means # we were given a bad encoding. source = source_decoded.replace("\x00", "") total_null_bytes += len(source_decoded) - len(source) ... warning = "" if total_null_bytes: warning = ( "\n" ) % total_null_bytes

Yeah, I considered adding that to safe_html but wasn't sure it was warranted since the sanitizer strips the null bytes anyway (confirmed with a unit test). The reason given in the markdown version for doing the stripping is that otherwise, the markdown conversion gets confused by the null bytes:

tensorboard/tensorboard/plugin_util_test.py

Lines 175 to 181 in ee9d15e

def test_null_bytes_stripped_before_markdown_processing(self):

# If this function is mistakenly called with UTF-16 or UTF-32 encoded text,

# there will probably be a bunch of null bytes. These would be stripped by

# the sanitizer no matter what, but make sure we remove them before markdown

# interpretation to avoid affecting output (e.g. middle-word underscores

# would generate erroneous <em> tags like "un<em>der</em>score") and add an

# HTML comment with a warning.

So the only difference really is whether we insert the diagnostic HTML comment or not. That didn't quite seem worth any potential overhead to me, but I'm happy to add it back if you prefer?

nfelt · 2021-10-15T17:19:11Z

PTAL re: null byte stripping

Another compromise we are making is that we cannot turn on/off markdown for a specific card since this is more of a global setting.

Yeah, this is a fair point, but I guess it seemed acceptable to me for now - I suspect most users will probably log either plain or markdown consistently depending on use case, and for those users, having to toggle each card separately would be a pain. For users who really do log several different varieties of text summary, it seems to me that they'd be best served anyway by introducing a write-time parameter to summary-metadata, since that way they don't have to go through and manually toggle the individual tags to the desired state.

stephanwlee

I definitely missed the null byte test. Thanks for pointing it out.

…flow#5378) * define plugin_util.safe_html() with no markdown interpretation * make plugin_util.safe_html() handle unicode vs bytes clearly * add markdown=false request parameter to disable markdown interpretation * fix bytes to str conversion bug in text plugin no-markdown codepath * text: add checkbox to control server-side markdown conversion * yarn fix-lint

nfelt added 5 commits October 13, 2021 16:24

define plugin_util.safe_html() with no markdown interpretation

e971300

make plugin_util.safe_html() handle unicode vs bytes clearly

5fd2c27

add markdown=false request parameter to disable markdown interpretation

80153d1

fix bytes to str conversion bug in text plugin no-markdown codepath

15cd986

text: add checkbox to control server-side markdown conversion

cc37ea6

nfelt added the plugin:text label Oct 14, 2021

google-cla bot added the cla: yes label Oct 14, 2021

yarn fix-lint

ee9d15e

nfelt requested a review from stephanwlee October 14, 2021 00:47

stephanwlee approved these changes Oct 15, 2021

View reviewed changes

nfelt merged commit e34e9e4 into tensorflow:master Oct 15, 2021

nfelt deleted the text-markdown-optional branch October 15, 2021 18:41

UsharaniPagadala mentioned this pull request Oct 27, 2021

Text plugin markdown encoding is slow #5369

Closed

nfelt mentioned this pull request Oct 27, 2021

tf.summary.text option to control whether text is treated as markdown #3590

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text: add checkbox to control server-side markdown conversion #5378

text: add checkbox to control server-side markdown conversion #5378

nfelt commented Oct 14, 2021 •

edited

stephanwlee left a comment

stephanwlee Oct 15, 2021

nfelt Oct 15, 2021

nfelt commented Oct 15, 2021

stephanwlee left a comment

	def test_null_bytes_stripped_before_markdown_processing(self):
	# If this function is mistakenly called with UTF-16 or UTF-32 encoded text,
	# there will probably be a bunch of null bytes. These would be stripped by
	# the sanitizer no matter what, but make sure we remove them before markdown
	# interpretation to avoid affecting output (e.g. middle-word underscores
	# would generate erroneous <em> tags like "un<em>der</em>score") and add an
	# HTML comment with a warning.

text: add checkbox to control server-side markdown conversion #5378

text: add checkbox to control server-side markdown conversion #5378

Conversation

nfelt commented Oct 14, 2021 • edited

stephanwlee left a comment

Choose a reason for hiding this comment

stephanwlee Oct 15, 2021

Choose a reason for hiding this comment

nfelt Oct 15, 2021

Choose a reason for hiding this comment

nfelt commented Oct 15, 2021

stephanwlee left a comment

Choose a reason for hiding this comment

nfelt commented Oct 14, 2021 •

edited