From eee9456012e0bbed424333703fafed45733f1cf8 Mon Sep 17 00:00:00 2001 From: Gary Hockin Date: Thu, 5 Nov 2015 10:26:44 -0500 Subject: [PATCH 1/2] Added base documentation --- doc/book/zend.escaper.configuration.md | 21 +++ doc/book/zend.escaper.escaping-css.md | 72 ++++++++++ .../zend.escaper.escaping-html-attributes.md | 121 +++++++++++++++++ doc/book/zend.escaper.escaping-html.md | 73 +++++++++++ doc/book/zend.escaper.escaping-javascript.md | 87 ++++++++++++ doc/book/zend.escaper.escaping-url.md | 54 ++++++++ doc/book/zend.escaper.introduction.md | 43 ++++++ doc/book/zend.escaper.theory-of-operation.md | 124 ++++++++++++++++++ doc/bookdown.json | 14 ++ 9 files changed, 609 insertions(+) create mode 100644 doc/book/zend.escaper.configuration.md create mode 100644 doc/book/zend.escaper.escaping-css.md create mode 100644 doc/book/zend.escaper.escaping-html-attributes.md create mode 100644 doc/book/zend.escaper.escaping-html.md create mode 100644 doc/book/zend.escaper.escaping-javascript.md create mode 100644 doc/book/zend.escaper.escaping-url.md create mode 100644 doc/book/zend.escaper.introduction.md create mode 100644 doc/book/zend.escaper.theory-of-operation.md create mode 100644 doc/bookdown.json diff --git a/doc/book/zend.escaper.configuration.md b/doc/book/zend.escaper.configuration.md new file mode 100644 index 0000000..79a0474 --- /dev/null +++ b/doc/book/zend.escaper.configuration.md @@ -0,0 +1,21 @@ +# Configuring Zend\\Escaper + +`Zend\Escaper\Escaper` has only one configuration option available, and that is the encoding to be +used by the Escaper object. + +The default encoding is **utf-8**. Other supported encodings are: + +> - iso-8859-1 +- iso-8859-5 +- iso-8859-15 +- cp866, ibm866, 866 +- cp1251, windows-1251 +- cp1252, windows-1252 +- koi8-r, koi8-ru +- big5, big5-hkscs, 950, gb2312, 936 +- shift\_jis, sjis, sjis-win, cp932 +- eucjp, eucjp-win +- macroman + +If an unsupported encoding is passed to `Zend\Escaper\Escaper`, a +`Zend\Escaper\Exception\InvalidArgumentException` will be thrown. diff --git a/doc/book/zend.escaper.escaping-css.md b/doc/book/zend.escaper.escaping-css.md new file mode 100644 index 0000000..1667ee3 --- /dev/null +++ b/doc/book/zend.escaper.escaping-css.md @@ -0,0 +1,72 @@ +# Escaping Cascading Style Sheets + +CSS is similar to \[Javascript\](zend.escaper.escaping-javascript) for the same reasons. CSS +escaping excludes only basic alphanumeric characters and escapes all other characters into valid CSS +hexadecimal escapes. + +## Examples of Bad CSS Escaping + +In most cases developers forget to escape CSS completely: + +``` sourceCode + + +'); +} +INPUT; +?> + + + Unescaped CSS + + + + +

User controlled CSS needs to be properly escaped!

+ + +``` + +In the above example, by failing to escape the user provided CSS, an attacker can execute an XSS +attack fairly easily. + +## Examples of Good CSS Escaping + +By using `escapeCss` method in the CSS context, such attacks can be prevented: + +``` sourceCode + + +'); +} +INPUT; +$escaper = new Zend\Escaper\Escaper('utf-8'); +$output = $escaper->escapeCss($input); +?> + + + Escaped CSS + + + + +

User controlled CSS needs to be properly escaped!

+ + +``` + +By properly escaping user controlled CSS, we can prevent XSS attacks in our web applications. diff --git a/doc/book/zend.escaper.escaping-html-attributes.md b/doc/book/zend.escaper.escaping-html-attributes.md new file mode 100644 index 0000000..07dd4e7 --- /dev/null +++ b/doc/book/zend.escaper.escaping-html-attributes.md @@ -0,0 +1,121 @@ +# Escaping HTML Attributes + +Escaping data in the **HTML Attribute context** is most often done incorrectly, if not overlooked +completely by developers. Regular \[HTML escaping\](zend.escaper.escaping-html) can be used for +escaping HTML attributes, *but* only if the attribute value can be **guaranteed as being properly +quoted**! To avoid confusion, we recommend always using the HTML Attribute escaper method in the +HTML Attribute context. + +To escape data in the HTML Attribute, use `Zend\Escaper\Escaper`'s `escapeHtmlAttr` method. +Internally it will convert the data to UTF-8, check for it's validity, and use an extended set of +characters to escape that are not covered by `htmlspecialchars` to cover the cases where an +attribute might be unquoted or quoted illegally. + +## Examples of Bad HTML Attribute Escaping + +An example of incorrect HTML attribute escaping: + +``` sourceCode + + + + + + Single Quoted Attribute + + + +
+ + ?> + + What framework are you using? + +
+ + +``` + +In the above example, the default `ENT_COMPAT` flag is being used, which does not escape single +quotes, thus resulting in an alert box popping up when the `onmouseover` event happens on the `span` +element. + +Another example of incorrect HTML attribute escaping can happen when unquoted attributes are used, +which is, by the way, perfectly valid HTML5: + +``` sourceCode + + + + + + Quoteless Attribute + + + +
+ + ?> + > + What framework are you using? + +
+ + +``` + +The above example shows how it is easy to break out from unquoted attributes in HTML5. + +## Examples of Good HTML Attribute Escaping + +Both of the previous examples can be avoided by simply using the `escapeHtmlAttr` method: + +``` sourceCode + + +escapeHtmlAttr($input); +?> + + + Quoteless Attribute + + + +
+ + ?> + > + What framework are you using? + +
+ + +``` + +In the above example, the malicious input from the attacker becomes completely harmless as we used +proper HTML attribute escaping! diff --git a/doc/book/zend.escaper.escaping-html.md b/doc/book/zend.escaper.escaping-html.md new file mode 100644 index 0000000..b5dcb95 --- /dev/null +++ b/doc/book/zend.escaper.escaping-html.md @@ -0,0 +1,73 @@ +# Escaping HTML + +Probably the most common escaping happens in the **HTML Body context**. There are very few +characters with special meaning in this context, yet it is quite common to escape data incorrectly, +namely by setting the wrong flags and character encoding. + +For escaping data in the HTML Body context, use `Zend\Escaper\Escaper`'s `escapeHtml` method. +Internally it uses PHP's `htmlspecialchars`, and additionally correctly sets the flags and encoding. + +``` sourceCode +// outputting this without escaping would be a bad idea! +$input = ''; + +$escaper = new Zend\Escaper\Escaper('utf-8'); + +// somewhere in an HTML template +
+ escapeHtml($input); // all safe! + ?> +
+``` + +One thing a developer needs to pay special attention too, is that the encoding in which the document +is served to the client, as it **must be the same** as the encoding used for escaping! + +## Examples of Bad HTML Escaping + +An example of incorrect usage: + +``` sourceCode +alert("zf2")'; +$escaper = new Zend\Escaper\Escaper('utf-8'); +?> + + + + + Encodings set incorrectly! + + + + escapeHtml($input); + ?> + +``` + +## Examples of Good HTML Escaping + +An example of correct usage: + +``` sourceCode +alert("zf2")'; +$escaper = new Zend\Escaper\Escaper('utf-8'); +?> + + + + + Encodings set correctly! + + + + escapeHtml($input); + ?> + +``` diff --git a/doc/book/zend.escaper.escaping-javascript.md b/doc/book/zend.escaper.escaping-javascript.md new file mode 100644 index 0000000..59f5a71 --- /dev/null +++ b/doc/book/zend.escaper.escaping-javascript.md @@ -0,0 +1,87 @@ +# Escaping Javascript + +Javascript string literals in HTML are subject to significant restrictions particularly due to the +potential for unquoted attributes and any uncertainty as to whether Javascript will be viewed as +being CDATA or PCDATA by the browser. To eliminate any possible XSS vulnerabilities, Javascript +escaping for HTML extends the escaping rules of both ECMAScript and JSON to include any potentially +dangerous character. Very similar to HTML attribute value escaping, this means escaping everything +except basic alphanumeric characters and the comma, period and underscore characters as hexadecimal +or unicode escapes. + +Javascript escaping applies to all literal strings and digits. It is not possible to safely escape +other Javascript markup. + +To escape data in the **Javascript context**, use `Zend\Escaper\Escaper`'s `escapeJs` method. An +extended set of characters are escaped beyond ECMAScript's rules for Javascript literal string +escaping in order to prevent misinterpretation of Javascript as HTML leading to the injection of +special characters and entities. + +## Examples of Bad Javascript Escaping + +An example of incorrect Javascript escaping: + +``` sourceCode + + + + + + Unescaped Entities + + + + +

json_encode() is not good for escaping javascript!

+ + +``` + +The above example will show an alert popup box as soon as the page is loaded, because the data is +not properly escaped for the Javascript context. + +## Examples of Good Javascript Escaping + +By using the `escapeJs` method in the Javascript context, such attacks can be prevented: + +``` sourceCode + + +escapeJs($input); +?> + + + Escaped Entities + + + + +

Zend\Escaper\Escaper::escapeJs() is good for escaping javascript!

+ + +``` + +In the above example, the Javascript parser will most likely report a `SyntaxError`, but at least +the targeted application remains safe from such attacks. diff --git a/doc/book/zend.escaper.escaping-url.md b/doc/book/zend.escaper.escaping-url.md new file mode 100644 index 0000000..46c195a --- /dev/null +++ b/doc/book/zend.escaper.escaping-url.md @@ -0,0 +1,54 @@ +# Escaping URLs + +This method is basically an alias for PHP's `rawurlencode()` which has applied RFC 3986 since PHP +5.3. It is included primarily for consistency. + +URL escaping applies to data being inserted into a URL and not to the whole URL itself. + +## Examples of Bad URL Escaping + +XSS attacks are easy if data inserted into URLs is not escaped properly: + +``` sourceCode + + + + + + Unescaped URL data + + + + Click here! + + +``` + +## Examples of Good URL Escaping + +By properly escaping data in URLs by using `escapeUrl`, we can prevent XSS attacks: + +``` sourceCode + + +escapeUrl($input); +?> + + + Unescaped URL data + + + + Click here! + + +``` diff --git a/doc/book/zend.escaper.introduction.md b/doc/book/zend.escaper.introduction.md new file mode 100644 index 0000000..b320f49 --- /dev/null +++ b/doc/book/zend.escaper.introduction.md @@ -0,0 +1,43 @@ +# Introduction to Zend\\Escaper + +The [OWASP Top 10 web security risks](https://www.owasp.org/index.php/Top_10_2010-Main) study lists +Cross-Site Scripting (XSS) in second place. PHP's sole functionality against XSS is limited to two +functions of which one is commonly misapplied. Thus, the `Zend\Escaper` component was written. It +offers developers a way to escape output and defend from XSS and related vulnerabilities by +introducing **contextual escaping based on peer-reviewed rules**. + +`Zend\Escaper` was written with ease of use in mind, so it can be used completely stand-alone from +the rest of the framework, and as such can be installed with Composer using +zendframework/zend-escaper. + +For easier use of the Escaper component within the framework itself, especially with the `Zend\View` +component, a \[set of view helpers\](zend.view.helpers) is provided. + +> ## Warning +The `Zend\Escaper` is a security related component. As such, if you believe you found an issue with +this component, we ask that you follow our [Security Policy](http://framework.zend.com/security/) +and report security issues accordingly. The Zend Framework team and the contributors thanks you in +advance. + +## Overview + +The `Zend\Escaper` component provides one class, `Zend\Escaper\Escaper` which in turn, provides five +methods for escaping output. Which method to use when, depends on the context in which the outputted +data is used. It is up to the developer to use the right methods in the right context. + +`Zend\Escaper\Escaper` has the following escaping methods available for each context: + +> - **escapeHtml**: escape a string for the HTML Body context. +- **escapeHtmlAttr**: escape a string for the HTML Attribute context. +- **escapeJs**: escape a string for the Javascript context. +- **escapeCss**: escape a string for the CSS context. +- **escapeUrl**: escape a string for the URI or Parameter contexts. + +Usage of each method will be discussed in detail in later chapters. + +## What Zend\\Escaper is not + +`Zend\Escaper` is meant to be used only for escaping data that is to be output, and as such should +not be misused for filtering input data. For such tasks, the \[Zend\\Filter +component\](zend.filter), [HTMLPurifier](http://htmlpurifier.org/) or PHP's +[Filter](http://php.net/manual/en/book.filter.php) component should be used. diff --git a/doc/book/zend.escaper.theory-of-operation.md b/doc/book/zend.escaper.theory-of-operation.md new file mode 100644 index 0000000..f1774c0 --- /dev/null +++ b/doc/book/zend.escaper.theory-of-operation.md @@ -0,0 +1,124 @@ +# Theory of Operation + +`Zend\Escaper` provides methods for escaping output data, dependent on the context in which the data +will be used. Each method is based on peer-reviewed rules and is in compliance with the current +OWASP recommendations. + +The escaping follows a well known and fixed set of encoding rules for each key HTML context, which +are defined by OWASP. These rules cannot be impacted or negated by browser quirks or edge-case HTML +parsing unless the browser suffers a catastrophic bug in it's HTML parser or Javascript interpreter +- both of these are unlikely. + +The contexts in which `Zend\Escaper` should be used are **HTML Body**, **HTML Attribute**, +**Javascript**, **CSS** and **URL/URI** contexts. + +Every escaper method will take the data to be escaped, make sure it is utf-8 encoded data, or try to +convert it to utf-8, do the context-based escaping, encode the escaped data back to it's original +encoding and return the data to the caller. + +The actual escaping of the data differs between each method, they all have their own set of rules +according to which the escaping is done. An example will allow us to clearly demonstrate the +difference, and how the same characters are being escaped differently between contexts: + +``` sourceCode +$escaper = new Zend\Escaper\Escaper('utf-8'); + +// <script>alert("zf2")</script> +echo $escaper->escapeHtml(''); +// <script>alert("zf2")</script> +echo $escaper->escapeHtmlAttr(''); +// \x3Cscript\x3Ealert\x28\x22zf2\x22\x29\x3C\x2Fscript\x3E +echo $escaper->escapeJs(''); +// \3C script\3E alert\28 \22 zf2\22 \29 \3C \2F script\3E +echo $escaper->escapeCss(''); +// %3Cscript%3Ealert%28%22zf2%22%29%3C%2Fscript%3E +echo $escaper->escapeUrl(''); +``` + +More detailed examples will be given in later chapters. + +## The Problem with Inconsistent Functionality + +At present, programmers orient towards the following PHP functions for each common HTML context: + +> - **HTML Body**: htmlspecialchars() or htmlentities() +- **HTML Attribute**: htmlspecialchars() or htmlentities() +- **Javascript**: addslashes() or json\_encode() +- **CSS**: n/a +- **URL/URI**: rawurlencode() or urlencode() + +In practice, these decisions appear to depend more on what PHP offers, and if it can be interpreted +as offering sufficient escaping safety, than it does on what is recommended in reality to defend +against XSS. While these functions can prevent some forms of XSS, they do not cover all use cases or +risks and are therefore insufficient defenses. + +Using htmlspecialchars() in a perfectly valid HTML5 unquoted attribute value, for example, is +completely useless since the value can be terminated by a space (among other things) which is never +escaped. Thus, in this instance, we have a conflict between a widely used HTML escaper and a modern +HTML specification, with no specific function available to cover this use case. While it's tempting +to blame users, or the HTML specification authors, escaping just needs to deal with whatever HTML +and browsers allow. + +Using addslashes(), custom backslash escaping or json\_encode() will typically ignore HTML special +characters such as ampersands which may be used to inject entities into Javascript. Under the right +circumstances, browser will convert these entities into their literal equivalents before +interpreting Javascript thus allowing attackers to inject arbitrary code. + +Inconsistencies with valid HTML, insecure default parameters, lack of character encoding awareness, +and misrepresentations of what functions are capable of by some programmers - these all make +escaping in PHP an unnecessarily convoluted quest. + +To circumvent the lack of escaping methods in PHP, `Zend\Escaper` addresses the need to apply +context-specific escaping in web applications. It implements methods that specifically target XSS +and offers programmers a tool to secure their applications without misusing other inadequate +methods, or using, most likely incomplete, home-grown solutions. + +## Why Contextual Escaping? + +To understand why multiple standardised escaping methods are needed, here's a couple of quick points +(by no means a complete set!): + +### HTML escaping of unquoted HTML attribute values still allows XSS + +This is probably the best known way to defeat htmlspecialchars() when used on attribute values since +any space (or character interpreted as a space - there are a lot) lets you inject new attributes +whose content can't be neutralised by HTML escaping. The solution (where this is possible) is +additional escaping as defined by the OWASP ESAPI codecs. The point here can be extended further - +escaping only works if a programmer or designer know what they're doing. In many contexts, there are +additional practices and gotchas that need to be carefully monitored since escaping sometimes needs +a little extra help to protect against XSS - even if that means ensuring all attribute values are +properly double quoted despite this not being required for valid HTML. + +### HTML escaping of CSS, Javascript or URIs is often reversed when passed to non-HTML interpreters +by the browser + +HTML escaping is just that - it's designed to escape a string for HTML (i.e. prevent tag or +attribute insertion) but not alter the underlying meaning of the content whether it be Text, +Javascript, CSS or URIs. For that purpose a fully HTML escaped version of any other context may +still have its unescaped form extracted before it's interpreted or executed. For this reason we need +separate escapers for Javascript, CSS and URIs and those writing templates **must** know which +escaper to apply to which context. Of course this means you need to be able to identify the correct +context before selecting the right escaper! + +### DOM based XSS requires a defence using at least two levels of different escaping in many cases + +DOM based XSS has become increasingly common as Javascript has taken off in popularity for large +scale client side coding. A simple example is Javascript defined in a template which inserts a new +piece of HTML text into the DOM. If the string is only HTML escaped, it may still contain Javascript +that will execute in that context. If the string is only Javascript escaped, it may contain HTML +markup (new tags and attributes) which will be injected into the DOM and parsed once the inserting +Javascript executes. Damned either way? The solution is to escape twice - first escape the string +for HTML (make it safe for DOM insertion), and then for Javascript (make it safe for the current +Javascript context). Nested contexts are a common means of bypassing naive escaping habits (e.g. you +can inject Javascript into a CSS expression within a HTML Attribute). + +### PHP has no known anti-XSS escape functions (only those kidnapped from their original purposes) + +A simple example, widely used, is when you see `json_encode()` used to escape Javascript, or worse, +some kind of mutant `addslashes()` implementation. These were never designed to eliminate XSS yet +PHP programmers use them as such. For example, `json_encode()` does not escape the ampersand or +semi-colon characters by default. That means you can easily inject HTML entities which could then be +decoded before the Javascript is evaluated in a HTML document. This lets you break out of strings, +add new JS statements, close tags, etc. In other words, using `json_encode()` is insufficient and +naive. The same, arguably, could be said for `htmlspecialchars()` which has its own well known +limitations that make a singular reliance on it a questionable practice. diff --git a/doc/bookdown.json b/doc/bookdown.json new file mode 100644 index 0000000..2425d45 --- /dev/null +++ b/doc/bookdown.json @@ -0,0 +1,14 @@ +{ + "title": "Zend\\Escaper", + "target": "html/", + "content": [ + "book/zend.escaper.introduction.md", + "book/zend.escaper.theory-of-operation.md", + "book/zend.escaper.configuration.md", + "book/zend.escaper.escaping-html.md", + "book/zend.escaper.escaping-html-attributes.md", + "book/zend.escaper.escaping-javascript.md", + "book/zend.escaper.escaping-css.md", + "book/zend.escaper.escaping-url.md" + ] +} \ No newline at end of file From d1fbe03bfc8fbd0b7ee3103828362df0902c69b5 Mon Sep 17 00:00:00 2001 From: Gary Hockin Date: Thu, 5 Nov 2015 12:22:03 -0500 Subject: [PATCH 2/2] Added includes --- doc/book/zend.escaper.escaping-css.md | 4 ++-- doc/book/zend.escaper.escaping-html-attributes.md | 6 +++--- doc/book/zend.escaper.escaping-html.md | 6 +++--- doc/book/zend.escaper.escaping-javascript.md | 4 ++-- doc/book/zend.escaper.escaping-url.md | 4 ++-- doc/book/zend.escaper.theory-of-operation.md | 2 +- 6 files changed, 13 insertions(+), 13 deletions(-) diff --git a/doc/book/zend.escaper.escaping-css.md b/doc/book/zend.escaper.escaping-css.md index 1667ee3..3e7d4a8 100644 --- a/doc/book/zend.escaper.escaping-css.md +++ b/doc/book/zend.escaper.escaping-css.md @@ -8,7 +8,7 @@ hexadecimal escapes. In most cases developers forget to escape CSS completely: -``` sourceCode +```php alert("zf2")'; @@ -28,7 +28,7 @@ is served to the client, as it **must be the same** as the encoding used for esc An example of incorrect usage: -``` sourceCode +```php alert("zf2")'; $escaper = new Zend\Escaper\Escaper('utf-8'); @@ -52,7 +52,7 @@ $escaper = new Zend\Escaper\Escaper('utf-8'); An example of correct usage: -``` sourceCode +```php alert("zf2")'; $escaper = new Zend\Escaper\Escaper('utf-8'); diff --git a/doc/book/zend.escaper.escaping-javascript.md b/doc/book/zend.escaper.escaping-javascript.md index 59f5a71..a0a17cb 100644 --- a/doc/book/zend.escaper.escaping-javascript.md +++ b/doc/book/zend.escaper.escaping-javascript.md @@ -20,7 +20,7 @@ special characters and entities. An example of incorrect Javascript escaping: -``` sourceCode +```php