Always convert text from html in MessagePreviewExtractor#10919
Always convert text from html in MessagePreviewExtractor#10919frabera wants to merge 9 commits intothunderbird:mainfrom
Conversation
This solves html characters or code in notifications and message previews (thunderbird#10256)
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Updated regex to remove URLs in parentheses from preview text.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
before, it interfered with the preview formatting of headers of forwarded and reply emails.
This comment was marked as resolved.
This comment was marked as resolved.
|
I modified the logic by moving the parsing and cleaning parts in In this way if a message contains only an html part, it is parsed twice. To avoid this a deeper modification is required, I'll wait for opinions from reviewers. Here's an example returning a boolean from fun extractPreview(textPart: Part): String {
val text = MessageExtractor.getTextFromPart(textPart, MAX_CHARACTERS_CHECKED_FOR_PREVIEW)
?: throw PreviewExtractionException("Couldn't get text from part")
val (plainText, requiresParsingHtml) = convertFromHtmlIfNecessary(textPart, text)
return stripTextForPreview(plainText, requiresParsingHtml)
}
private fun convertFromHtmlIfNecessary(textPart: Part, text: String): Pair<String, Boolean> {
return if (isSameMimeType(textPart.mimeType, "text/html")) {
HtmlConverter.htmlToText(text) to false
} else {
text to true
}
}
private fun stripTextForPreview(text: String, parseHtml: Boolean): String {
var intermediateText = text
intermediateText = normalizeLineBreaks(intermediateText)
intermediateText = stripSignature(intermediateText)
intermediateText = extractUnquotedText(intermediateText)
// try to remove lines of dashes in the preview
intermediateText = intermediateText.replace("(?m)^----.*?$".toRegex(), "")
// Remove horizontal rules.
intermediateText = intermediateText.replace("\\s*([-=_]{30,}+)\\s*".toRegex(), " ")
// If the textPart was plaintext, parse as HTML
if (parseHtml) {
intermediateText = HtmlConverter.htmlToText(intermediateText)
}
// Remove parsed HTML links/images "<url>"
intermediateText = intermediateText.replace("<https?://\\S+>".toRegex(), " ")
// URLs in the preview should just be shown as "..." - They're not
// clickable and they usually overwhelm the preview
intermediateText = intermediateText.replace("https?://\\S+".toRegex(), "...")
// Don't show newlines in the preview
intermediateText = intermediateText.replace('\n', ' ')
// Collapse whitespace in the preview
intermediateText = intermediateText.replace("\\s+".toRegex(), " ")
// Remove any whitespace at the beginning and end of the string.
intermediateText = intermediateText.trim()
return if (intermediateText.length > MAX_PREVIEW_LENGTH) {
intermediateText.substring(0, MAX_PREVIEW_LENGTH - 1) + "…"
} else {
intermediateText
}
} |
|
@frabera Thanks for the fix. I think having the flag improves the handling. I would add some tests to check the behavior especially for forwarded emails, special characters and html in plain text. This might help to check the logic works, maybe you could also add more variations, html mail with forwarded plain text or plain text with html forwarded email. Then test with special characters and html code in the plain text part. @Test
fun extractPreview_forwardedMessage() {
val text =
"""
Here is the forwarded message:
-----Original Message-----
From: alice@example.com
Sent: Monday, January 1, 2024 10:00 AM
To: bob@example.com
Subject: Hello
This is the original content.
""".trimIndent()
val part = MessageCreationHelper.createTextPart("text/plain", text)
val preview = previewTextExtractor.extractPreview(part)
assertThat(preview).isEqualTo("Here is the forwarded message: This is the original content.")
}
@Test
fun extractPreview_htmlForwardedMessage() {
val text =
"""
<html>
<body>
Here is the forwarded message:<br>
<br>
-----Original Message-----<br>
From: alice@example.com<br>
Sent: Monday, January 1, 2024 10:00 AM<br>
To: bob@example.com<br>
Subject: Hello<br>
<br>
This is the original content.
</body>
</html>
""".trimIndent()
val part = MessageCreationHelper.createTextPart("text/html", text)
val preview = previewTextExtractor.extractPreview(part)
assertThat(preview).isEqualTo("Here is the forwarded message: This is the original content.")
}I'm open for suggestions how to mark the content of the forwarded message. Or even threat this as a separate issue. When working on the test, keep the line breaks as is, otherwise the tests will fail. Ideally the test emails need to be loaded from a file, instead of being part of the code that reformats the message in an incompatible way. We won't have time to fix this now, so we're open for contributions. |
This solves html characters or code in notifications (#10256) and message previews (#8471)
Previously it only converted from html after checking the mimetype
convertFromHtmlIfNecessary()but if the message contained html code or special characters in the plain text part they were rendered in the message previews and in notifications, polluting content.I kept the function, let me know if there is a cleaner way to accomplish this. I tested it with messages containing special characters and also with html code, they are not shown.