Describe the bug
When converting a Markdown image with a base64 source link and safe_mode is True, the link becomes corrupted.
When the URL is on the form data:<mime-type>;base64,<content>, the <content> part should be left unchanged.
To Reproduce
Displaying a smiley PNG image in a img tag, base64 encoded (safe_mode = False).
import markdown2
markdown2.markdown(text="")
Output is correct (the link remained unchanged) :
'<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAApgAAAKYB3X3/OAAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAANCSURBVEiJtZZPbBtFFMZ/M7ubXdtdb1xSFyeilBapySVU8h8OoFaooFSqiihIVIpQBKci6KEg9Q6H9kovIHoCIVQJJCKE1ENFjnAgcaSGC6rEnxBwA04Tx43t2FnvDAfjkNibxgHxnWb2e/u992bee7tCa00YFsffekFY+nUzFtjW0LrvjRXrCDIAaPLlW0nHL0SsZtVoaF98mLrx3pdhOqLtYPHChahZcYYO7KvPFxvRl5XPp1sN3adWiD1ZAqD6XYK1b/dvE5IWryTt2udLFedwc1+9kLp+vbbpoDh+6TklxBeAi9TL0taeWpdmZzQDry0AcO+jQ12RyohqqoYoo8RDwJrU+qXkjWtfi8Xxt58BdQuwQs9qC/afLwCw8tnQbqYAPsgxE1S6F3EAIXux2oQFKm0ihMsOF71dHYx+f3NND68ghCu1YIoePPQN1pGRABkJ6Bus96CutRZMydTl+TvuiRW1m3n0eDl0vRPcEysqdXn+jsQPsrHMquGeXEaY4Yk4wxWcY5V/9scqOMOVUFthatyTy8QyqwZ+kDURKoMWxNKr2EeqVKcTNOajqKoBgOE28U4tdQl5p5bwCw7BWquaZSzAPlwjlithJtp3pTImSqQRrb2Z8PHGigD4RZuNX6JYj6wj7O4TFLbCO/Mn/m8R+h6rYSUb3ekokRY6f/YukArN979jcW+V/S8g0eT/N3VN3kTqWbQ428m9/8k0P/1aIhF36PccEl6EhOcAUCrXKZXXWS3XKd2vc/TRBG9O5ELC17MmWubD2nKhUKZa26Ba2+D3P+4/MNCFwg59oWVeYhkzgN/JDR8deKBoD7Y+ljEjGZ0sosXVTvbc6RHirr2reNy1OXd6pJsQ+gqjk8VWFYmHrwBzW/n+uMPFiRwHB2I7ih8ciHFxIkd/3Omk5tCDV1t+2nNu5sxxpDFNx+huNhVT3/zMDz8usXC3ddaHBj1GHj/As08fwTS7Kt1HBTmyN29vdwAw+/wbwLVOJ3uAD1wi/dUH7Qei66PfyuRj4Ik9is+hglfbkbfR3cnZm7chlUWLdwmprtCohX4HUtlOcQjLYCu+fzGJH2QRKvP3UNz8bWk1qMxjGTOMThZ3kvgLI5AzFfo379UAAAAASUVORK5CYII=" alt="smiley" /></p>\n'
Same thing but with the safe_mode parameter set as True:
markdown2.markdown(safe_mode=True, text="")
The output is invalid as the + characters have been turned into , and the link became corrupted:
'<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAYAAADgdz34AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAApgAAAKYB3X3/OAAAABl0RVh0U29mdHdhcmUAd3d3Lmlua3NjYXBlLm9yZ5vuPBoAAANCSURBVEiJtZZPbBtFFMZ/M7ubXdtdb1xSFyeilBapySVU8h8OoFaooFSqiihIVIpQBKci6KEg9Q6H9kovIHoCIVQJJCKE1ENFjnAgcaSGC6rEnxBwA04Tx43t2FnvDAfjkNibxgHxnWb2e/u992bee7tCa00YFsffekFY nUzFtjW0LrvjRXrCDIAaPLlW0nHL0SsZtVoaF98mLrx3pdhOqLtYPHChahZcYYO7KvPFxvRl5XPp1sN3adWiD1ZAqD6XYK1b/dvE5IWryTt2udLFedwc1 9kLp vbbpoDh 6TklxBeAi9TL0taeWpdmZzQDry0AcO jQ12RyohqqoYoo8RDwJrU qXkjWtfi8Xxt58BdQuwQs9qC/afLwCw8tnQbqYAPsgxE1S6F3EAIXux2oQFKm0ihMsOF71dHYx f3NND68ghCu1YIoePPQN1pGRABkJ6Bus96CutRZMydTl TvuiRW1m3n0eDl0vRPcEysqdXn jsQPsrHMquGeXEaY4Yk4wxWcY5V/9scqOMOVUFthatyTy8QyqwZ kDURKoMWxNKr2EeqVKcTNOajqKoBgOE28U4tdQl5p5bwCw7BWquaZSzAPlwjlithJtp3pTImSqQRrb2Z8PHGigD4RZuNX6JYj6wj7O4TFLbCO/Mn/m8R h6rYSUb3ekokRY6f/YukArN979jcW V/S8g0eT/N3VN3kTqWbQ428m9/8k0P/1aIhF36PccEl6EhOcAUCrXKZXXWS3XKd2vc/TRBG9O5ELC17MmWubD2nKhUKZa26Ba2 D3P 4/MNCFwg59oWVeYhkzgN/JDR8deKBoD7Y ljEjGZ0sosXVTvbc6RHirr2reNy1OXd6pJsQ gqjk8VWFYmHrwBzW/n uMPFiRwHB2I7ih8ciHFxIkd/3Omk5tCDV1t 2nNu5sxxpDFNx huNhVT3/zMDz8usXC3ddaHBj1GHj/As08fwTS7Kt1HBTmyN29vdwAw /wbwLVOJ3uAD1wi/dUH7Qei66PfyuRj4Ik9is hglfbkbfR3cnZm7chlUWLdwmprtCohX4HUtlOcQjLYCu fzGJH2QRKvP3UNz8bWk1qMxjGTOMThZ3kvgLI5AzFfo379UAAAAASUVORK5CYII=" alt="smiley" /></p>\n'
Expected behavior
The second output should be the same as the first one.
Debug info
Version of library being used: 2.4.10
Any extras being used:
None
Additional context
The replacement is made here, because safe_mode = True: (markdown2.py):
def _html_escape_url(attr, safe_mode=False):
"""Replace special characters that are potentially malicious in url string."""
escaped = (attr
.replace('"', '"')
.replace('<', '<')
.replace('>', '>'))
if safe_mode:
escaped = escaped.replace('+', ' ')
escaped = escaped.replace("'", "'")
return escaped
However, when entering this method, this is actually already too late.
A call to self._protect_url(url), and so to _html_escape_url(url, safe_mode=self.safe_mode) is made here:
if is_img:
img_class_str = self._html_class_str_from_tag("img")
result = '<img src="%s" alt="%s"%s%s%s' \
% (self._protect_url(url),
_xml_escape_attr(link_text),
title_str,
img_class_str,
self.empty_element_suffix)
Describe the bug
When converting a Markdown image with a base64 source link and
safe_modeisTrue, the link becomes corrupted.When the URL is on the form
data:<mime-type>;base64,<content>, the<content>part should be left unchanged.To Reproduce
Displaying a smiley PNG image in a
imgtag, base64 encoded (safe_mode = False).Output is correct (the link remained unchanged) :
Same thing but with the
safe_modeparameter set asTrue:The output is invalid as the
+characters have been turned into, and the link became corrupted:Expected behavior
The second output should be the same as the first one.
Debug info
Version of library being used: 2.4.10
Any extras being used:
None
Additional context
The replacement is made here, because
safe_mode = True: (markdown2.py):However, when entering this method, this is actually already too late.
A call to
self._protect_url(url), and so to_html_escape_url(url, safe_mode=self.safe_mode)is made here: