The most common web application security weakness is the failure to properly validate input from the client or environment.
OWASP, Data Validation (2 July 2017)
Failure to properly handle untrusted input leaves an application vulnerable to Code Injection attacks, such as Cross Site Scripting (XSS). "Some form of visible tainting on input from the client or untrusted sources" is one countermeasure recommended by OWASP, as part of a strategy of Defence in Depth.
This Python module "taints" untrusted input by wrapping them in special types. These types behave in most respects just like native Python types, but prevent you from using their values accidentally. This provides not just "visible tainting", but runtime guarantees and (optionally) statically-verifiable type safety with mypy.
Strategies for sanitizing, escaping, normalising or validating input are out of scope for this module.
This module should be suitable for production use, with the following caveats:
- TODO
untrusted.sequence
type is missing tests and may be incomplete - TODO
untrusted.mapping
type is missing tests and may be incomplete - TODO
unstrusted.<*>Of
is not fully tested - TODO only partial coverage for statically-verifiable type safety
Any code with security considerations deserves the highest standards of scrutiny. Please exercise your judgement before using this module.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND.
Tested for Python >= 3.4, but earlier versions may work.
sudo pip3 install untrusted --upgrade
(If you only have python3 installed, you may need only pip
- not pip3
)
import html # for html.escape
import untrusted
line = untrusted.string(input("Enter some text, HTML will be escaped: "))
try:
# You're not allowed to print an untrusted.string!
# This raises a TypeError!
print("<b>You typed:</b> " + line)
except TypeError:
pass # Expected
# Safely escape the HTML!
print("<b>You typed:</b> " + line / html.escape)
# / is overloaded as shorthand for:
# print("<b>You typed:</b> " + line.escape(html.escape))
- Looks and feels like a
str
type, but can't be output without escaping - Seamlessly interoperable with native
str
types - Mixed usage of native
str
anduntrusted.string
taintsstr
values/promotes them tountrusted.string
types.
Example of handling untrusted HTML:
import html # for html.escape()
import untrusted
# example of a string that could be provided by an untrusted user
firstName = untrusted.string("Grace")
lastName = untrusted.string("<script>alert('hack attempt!');</script>")
# works seamlessly with native python strings
fullName = firstName + " " + lastName
# fullName was tainted and keeps the untrusted.string type
print(repr(fullName)) # <untrusted.string of length 46>
# the normal string methods still work as you would expect
fullName = fullName.title()
# but you can't accidentally use it where a `str` is expected
try:
print(fullName) # raises TypeError - untrusted.string used as a `str`!
fullName.encode('utf8') # also raises TypeError
except TypeError:
print("We caught an error, as expected!")
# instead, you are forced to explicitly escape your string somehow
print("<b>Escaped output:</b> " + fullName.escape(html.escape)) # prints safe HTML!
print("<b>Escaped output:</b> " + fullName / html.escape) # use this easy shorthand!
See untrustedStringExample.py
for composing multiple escape functions,
passing arguments to escape functions, etc.
This module provides types to lazily wrap collections of untrusted values.
The values are wrapped with an appropriate untrusted.*
type when accessed.
- This is a view over any iterable or generator yielding untrusted values.
- It only yields values wrapped by an
untrusted.*
type - Usually this is an
untrusted.string
but it can also be an iterator over untrusted collections
Example:
import html # for html.escape
import untrusted
it = untrusted.iterator(open("example.txt"))
for i, s in enumerate(it):
print("Line %d: %s" % (i, s.escape(html.escape).strip()))
See examples/untrustedIteratorExample.py
for an untrusted nested list
(e.g. an untrusted.iterator
of untrusted.iterator
of untrusted.string
).
- This is a view
over any
list
-like object containing untrusted values. - All accessed values are wrapped by an
untrusted.*
type - Usually this is an
untrusted.string
but it can also be an iterator over untrusted collections
Example:
import html # for html.escape
import untrusted
# list of strings from an untrusted source
animals = ["cat", "dog", "monkey", "elephant", "<big>mouse</big>"]
untrustedAnimalList = untrusted.sequence(animals)
assert "cat" in untrustedAnimalList
- This is a view
over any
dict
-like object mapping trusted or untrusted keys to untrusted values. - All accessed values, and optionally keys, are wrapped by an
untrusted.*
type. - Usually this is a mapping
str
->untrusted.string
, but it could also be a mappingunstrusted.string
->untrusted.string
, or a mapping to an untrusted collection.
Example:
import html # for html.escape
import untrusted
user = untrusted.mapping({
'username': 'example-username<b>hack attempt</b>',
'password': 'example-password',
})
try:
print(user.get("username"))
except TypeError:
print("Caught the error we expected!")
print(user.get("username", "default-username") / html.escape)
Example:
import html # for html.escape
import untrusted
untrustedType = untrusted.mappingOf(untrusted.string, untrusted.string)
args = untrustedType({'animal': 'cat', '<b>hack</b>': 'attempt'})
for k,v in args.items():
print("%s: %s" % (k / html.escape, v / html.escape))
Lazily nested containers are fully supported, too.
Use untrusted.iteratorOf(valueType)
, untrusted.sequenceOf(valueType)
, or
untrusted.mappingOf(keyType, valueType)
to create a specific
container type.
Example:
import html # for html.escape
import untrusted
people = [
{
'id': 'A101',
'name.first': 'Grace',
'name.last': 'Hopper',
'name.other': 'Brewster Murray',
'dob': '1906-12-09',
},
{
'id': 'A102',
'name.first': 'Alan',
'name.last': 'Turing',
'name.other': 'Mathison',
'dob': '1912-06-23',
},
{
'id': 'HACKER',
'name.first': 'Robert\'); DROP TABLE Students;--',
'name.last': '£Hacker',
'dob': '<b>Potato</b>'
},
]
# a list of dicts with trusted keys, but untrusted values
mappingType = untrusted.sequenceOf(untrusted.mapping)
# aka (setting defaults explicitly)
mappingType = untrusted.sequenceOf(untrusted.mappingOf(str, untrusted.string))
for person in mappingType(people):
for key, value in person.items():
print(" %s: %s" % (key, value.escape(html.escape)))
Supports most native str
methods, but may possibly
have different return types.
The underlying value - always a non-None instance of str
.
untrusted.string(s)
Constructs a untrusted.string
object, where s
is a non-None instance
of str
or untrusted.string
.
In the case of an untrusted.string
being constructed with an
untrusted.string
argument, the new value is only wrapped once. Escaping
it will give a str
, not an untrusted.string
.
untrusted.string.escape(escape_function, [*args, **kwargs]) -> str
Applies the escape_function
, a function str -> str
that escapes
a string and returns it, with optional arguments and keyword arguments.
untrusted.string.valid(valid_function, [*args, **kwargs]) -> bool
Applies the valid_function
, a function str -> bool
, that checks
a string and returns True or False, with optional arguments and
keyword arguments.
untrusted.string.valid(validate_function, [*args, **kwargs]) -> any
Applies the valid_function
, a function str -> any
, that checks
a string and returns any value (e.g. a list of reasons why a string did not
validate), with optional arguments and keyword arguments.
For some escape expression, untrusted.string("value") / escape_expr -> str
An escape_expr
here is either a function str -> str
that escapes a string,
or a 3-tuple (escape_function, args, kwargs_dict)
, for example:
import html # for html.escape
import untrusted
myString = untrusted.string("<b>Exam\"ple</b>")
myEscape = (html.escape, [], {'quote': False})
print(myString / html.escape)
print(myString / myEscape)
Where collection
is one of iterator
, sequence
(list-like),
mapping
(dict-like), or a user-constructed custom type.
Supports most native collection methods, but may possibly have different return types.
Each collection is aware of its Value Type (default
untrusted.string
).
Mappings are also aware of their Key Type (default str
)
The underlying value - always a non-None object that is an instance of the collection's Value Type.
Create an iterator type using untrusted.iteratorOf(type)
.
Create a sequence type using untrusted.sequenceOf(type)
.
Create a mapping type using untrusted.sequenceOf(keyType, valueType)
.
These definitions are recursive.
For example:
import untrusted
itType = untrusted.iteratorOf(untrusted.iteratorOf(untrusted.string))
seqType = untrusted.sequenceOf(itType)
mappingType = untrusted.mappingOf(untrusted.string, seqType)
someDict = {}
myMapping = mappingType(someDict)
# or, as a less readable one-liner
myMapping = untrusted.mappingOf(untrusted.string, seqType)(someDict)
Remember, just because you have used one method to escape an untrusted.string
into a str
, it may not be safe in other contexts. For example, what's safe
for HTML might still be dangerous SQL. At the time user input is captured, it
may not be known in advance the context in which it is to be used - and
therefore it is not yet known what is the correct validation and escaping
strategy. It's best to delay the escaping until the final point of use -
keep a value as untrusted.*
for as long as possible.
This module isn't a magic solution. It's a tool to be used wisely. Don't fall
into the trap of thinking about a value "it's a str
now so it's completely
safe".
Just because a string can be escaped safely, it does not mean that it has been validated as allowable input. For example, you might put a minimum limit on a password. Or you might require that an input isn't a reserved filename.
Nice ways to do this are to use unstruted.string.valid
method, which returns
a boolean, untrusted.string.validate
method, which returns any value (e.g.
this might be a list of reasons why the input didn't valdiate), or to throw
a ValueError
from a function that both escapes and validates.
Sometimes, you generate HTML yourself and escaping is something you need to do.
Sometimes, an interface always separates parameters from code, like most modern SQL libraries. In this case the database driver will handle potentially untrusted input better than you ever could hope to, and escaping it is the wrong thing to do.
You might mark the input as untrusted for other reasons - e.g. to enforce a maximum length on a search query, or because you need to validate it against business logic.
Untrusted collection types, like untrusted.sequence
, are
"views"
over the underling object. If the underlying object changes, so does the object you
see through the untrusted collection type. In other words: its a reference
to the same object, not a copy. If that's not what you want, use the
copy module.
This should hopefully be obvious and unsuprising behaviour, not at all unique to this module, but it can trip people up.
This is free software (MIT license).
untrusted - write safer Python with special untrusted types
Copyright © 2017 - 2018 Ben Golightly <ben@tawesoft.co.uk>
Copyright © 2017 - 2018 Tawesoft Ltd <opensource@tawesoft.co.uk>
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.