📄 HTML in Tagged Templates
$ npm install @radically-straightforward/html
Note: We recommend the following tools:
Prettier: A code formatter that supports HTML in tagged templates.
Prettier - Code formatter: A Visual Studio Code extension to use Prettier more ergonomically.
ES6 String HTML: A Visual Studio Code extension to syntax highlight HTML in tagged templates.
Indentation Level Movement: A Visual Studio Code extension to navigate in code based on indentation, which helps with long snippets of HTML in tagged templates.
Note: This tool is primarily designed for rendering HTML on the server with Node.js, but it also works in the browser.
import html, { HTML } from "@radically-straightforward/html";
import * as htmlUtilities from "@radically-straightforward/html";
export type HTML = string;
A type alias to make your type annotations more specific.
export default function html(
templateStrings: TemplateStringsArray,
...substitutions: (string | string[])[]
): HTML;
A tagged template for HTML:
html`<p>Leandro Facchinetti</p>`;
Sanitizes interpolations to prevent injection attacks:
html`<p>${"Leandro Facchinetti"}</p>`;
// => `<p>Leandro Facchinetti</p>`
html`<p>${`<script>alert(1);</script>`}</p>`;
// => `<p><script>alert(1);</script></p>`
Note: Sanitization is only part of the defense against injection attacks. Also deploy the following measures:
- Serve your pages with UTF-8 encoding.
- Have your server send the header
Content-Type: text/html; charset=utf-8
.- If you want to be extra sure that the encoding will be picked up by the browser, include a
<meta charset="utf-8" />
meta tag. (But HTML 5 documents must be encoded in UTF-8, so it should be sufficient to declare your document as HTML 5 by starting it with<!DOCTYPE html>
.)- Always use quotes around HTML attributes (for example,
href="https://leafac.com"
instead ofhref=https://leafac.com
).- See https://wonko.com/post/html-escaping/.
Note: This library works by concatenating strings. It doesn’t prettify the output (if you need that you may, for example, call Prettier programmatically on the output of
html`___`
), and it doesn’t generate any kind of virtual DOM. The virtues of this approach are that this library is conceptually simple and it is one order of magnitude faster thanReactDOMServer.renderToStaticMarkup()
(performance matters because rendering may be one of the most time-consuming tasks in responding to a request).
Opt out of sanitization with $${___}
instead of ${___}
:
html`<div>$${`<p>Leandro Facchinetti</p>`}</div>`;
// => `<div><p>Leandro Facchinetti</p></div>`
Note: Only opt out of sanitization if you are sure that the interpolated string is safe, in particular it must not contain user input, otherwise you’d be open to injection attacks:
html`<div>$${`<script>alert(1);</script>`}</div>`; // => `<div><script>alert(1);</script></div>`
Note: You must opt out of sanitization when the interpolated string is itself the result of
html`___`
, otherwise the escaping would be doubled:html` <div> Good (escape once): $${html`<p>${`<script>alert(1);</script>`}</p>`} </div> `; // => // ` // <div> // Good (escape once): <p><script>alert(1);</script></p> // </div> // ` html` <div> Bad (double escaping): ${html`<p>${`<script>alert(1);</script>`}</p>`} </div> `; // => // ` // <div> // Bad (double escaping): <p>&lt;script&gt;alert(1);&lt;/script&gt;</p> // </div> // `
Note: As an edge case, if you need a literal
$
before an interpolation, interpolate the$
itself:html`<p>${"$"}${"Leandro Facchinetti"}</p>`; // => `<p>$Leandro Facchinetti</p>`
Interpolated arrays are joined:
html`<p>${["Leandro", " ", "Facchinetti"]}</p>`;
// => `<p>Leandro Facchinetti</p>`
Note: Interpolated arrays are sanitized:
html` <p>${["Leandro", " ", "<script>alert(1);</script>", " ", "Facchinetti"]}</p> `; // => // ` // <p>Leandro <script>alert(1);</script> Facchinetti</p> // `You may opt out of the sanitization of interpolated arrays by using
$${___}
instead of${___}
:html` <ul> $${[html`<li>Leandro</li>`, html`<li>Facchinetti</li>`]} </ul> `; // => // ` // <ul> // <li>Leandro</li><li>Facchinetti</li> // </ul> // `
export function sanitize(
text: string,
replacement: string = sanitize.replacement,
): string;
Sanitize text for safe insertion in HTML.
sanitize()
escapes characters that are meaningful in HTML syntax and replaces invalid XML characters with a string of your choosing—by default, an empty string (""
). You may provide the replacement
as a parameter or set a new default by overwriting sanitize.replacement
. For example, to use the Unicode replacement character:
sanitize.replacement = "�";
Note: The
html`___`
tagged template already callssanitize()
, so you must not callsanitize()
yourself or the sanitization would happen twice.
Note: The sanitization to which we refer here is at the character level, not cleaning up certain tags while preserving others. For that, we recommend
rehype-sanitize
.
Note: Even this sanitization isn’t enough in certain contexts, for example, HTML attributes without quotes (
<a href=${sanitize(___)}>
) could still lead to XSS attacks.
export function escape(text: string): string;
Escape characters that are meaningful in HTML syntax.
What sets this implementation apart from existing ones are the following:
-
Performance.
The performance of the
escape()
function matters because it’s used frequently to escape user input when rendering HTML with thehtml`___`
tagged template.The following are some details on how this implementation is made faster:
-
The relatively new string function
replaceAll()
when used with a string parameter is faster thanreplace()
with a regular expression. -
Perhaps surprisingly, calling
replaceAll()
multiple times is faster than using a single regular expression of the kind/[&<>"']/g
. -
And even if we were to use a single regular expression, using
switch/case
would have been faster than the lookup tables that most other implementations use. -
And also if we were to use regular expressions, using the flag
v
incurs on a very small but consistent performance penalty. -
And also if we were to use regular expressions,
replace()
is marginally but consistently faster thanreplaceAll()
. -
Measurements performed in Node.js 21.2.0.
-
-
Supports modern browsers only.
Most other implementations escape characters such as
`
, which could cause problems in Internet Explorer 8 and older.Some other implementations avoid transforming
'
into the entity'
, because that entity isn’t understood by some versions of Internet Explorer.
References
- https://github.com/lodash/lodash/blob/ddfd9b11a0126db2302cb70ec9973b66baec0975/lodash.js#L384-L390
- https://github.com/mathiasbynens/he/blob/36afe179392226cf1b6ccdb16ebbb7a5a844d93a/src/he.js#L35-L50
- https://github.com/fb55/entities/blob/02a5ced5708fa4a91b9f17a038da24d1c6200f1f/src/escape.ts
- https://mathiasbynens.be/notes/ambiguous-ampersands
- https://wonko.com/post/html-escaping/
- https://stackoverflow.com/questions/7918868/how-to-escape-xml-entities-in-javascript
export const invalidXMLCharacters: RegExp;
A regular expression that matches invalid XML characters.
Use this to remove or replace invalid XML characters, or simply to detect that a string doesn’t contain them. This is particularly useful when generating XML based on user input.
This list is based on Extensible Markup Language (XML) 1.1 (Second Edition), § 2.2 Characters (https://www.w3.org/TR/xml11/#charsets). In particular, it includes:
\u{0}
, which is always invalid in XML.- The gaps between the allowed ranges in the production rule for
[2] Char
from the “Character Range” grammar. - The discouraged characters from the Note in that section of the document.
Notably, it does not include the “"compatibility characters", as defined in Unicode” mentioned in that section of the document, because that list was difficult to find and doesn’t seem to be very important.
Example
someUserInput.replace(invalidXMLCharacters, ""); // Remove invalid XML characters.
someUserInput.replace(invalidXMLCharacters, "�"); // Replace invalid XML characters with the Unicode replacement character.
someUserInput.match(invalidXMLCharacters); // Detect whether there are invalid XML characters.
References
- https://www.w3.org/TR/xml/#charsets: An older version of the XML standard.
- https://en.wikipedia.org/wiki/Valid_characters_in_XML
- https://en.wikipedia.org/wiki/XML
- https://github.com/felixrieseberg/sanitize-xml-string/blob/661bd881613c0f7555eb7d73b883b853b9826cc6/src/index.ts: It was the inspiration for this code. The differences are:
- We export the regular expression, instead of encapsulating it in auxiliary functions, which arguably makes it more useful.
- We are more strict on what we consider to be valid characters (see description above).
- We use the regular expression flag
v
instead ofu
(see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/unicodeSets).
-
JSX requires you to deviate from plain HTML in some edge cases, for example,
class="___"
becomesclassName="___"
, because the HTML embedded in JavaScript with JSX is turned into function calls andclass
is a reserved word in JavaScript.@radically-straightforward/html
doesn’t run into this issue, because HTML is treated as strings. -
According to some ad-hoc benchmarks,
@radically-straightforward/html
is one order of magnitude faster than React’srenderToStaticMarkup()
, which is significant because rendering HTML accounts for a good amount of the time spent responding to a request. -
Embedding HTML in the JavaScript is only half of the battle: We also need to embed CSS and browser JavaScript in the embedded HTML itself. In JSX there are two solutions for this: 1. Represent CSS as a JavaScript object (for example, turning
background-color: red;
into{ backgroundColor: "red" }
); or 2. Use tagged templates (for example,css`background-color: red;`
). Solution 1 is awkward because it introduces one layer of indirection (you can’t, for example, copy-and-paste snippets of CSS you find on the internet). Solution 2 is awkward because is solves the same problem (embedding one language in another) in different ways (HTML turns into JSX, and CSS turns into tagged templates).@radically-straightforward/html
, along with@radically-straightforward/css
and@radically-straightforward/javascript
, presents a unified approach to solve the problem using tagged templates. -
JSX requires an extra compilation step. Admittedly, so does idiomatic use of
@radically-straightforward/css
and@radically-straightforward/javascript
through@radically-straightforward/build
.
-
Was a major inspiration for this. Its design is simple and great. In particular, I love (and stole) the idea of using
$${___}
for opting out of sanitization.
-
Doesn’t encode interpolated values by default.
-
Uses the
safeHtml
tag, which isn’t recognized by Prettier or the Visual Studio Code extension es6-string-html extension.
- Less ergonomic API with
escapeHtml.safe()
andescapeHtml.join()
instead of the$${}
trick.
- Have the notion of virtual DOM instead of simple string concatenation.