Skip to content

Commit

Permalink
remove maxSize/maxDepth/UnboundedSchema
Browse files Browse the repository at this point in the history
  • Loading branch information
warner committed Mar 15, 2010
1 parent 42985fb commit 6abdbc0
Show file tree
Hide file tree
Showing 14 changed files with 24 additions and 490 deletions.
9 changes: 9 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
2010-03-14 Brian Warner <warner@lothar.com>

* foolscap/constraint.py: remove maxSize/maxDepth methods, and the
related UnboundedSchema exception. As described in ticket #127,
I'm giving up on resource-exhaustion defenses, which allows for a
lot of code simplification.
* foolscap/{copyable.py|removeinterface.py|schema.py}: same
* foolscap/slicers/*.py: same
* foolscap/test/test_schema.py: remove tests
* doc/jobs.txt: remove TODO items around maxSize

* foolscap/_version.py: bump version while between releases
* misc/*/debian/changelog: same

Expand Down
171 changes: 0 additions & 171 deletions doc/jobs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -156,177 +156,6 @@ name must be looked up in the unjellyableRegistry (and the class located, and
the __implements__ list consulted) before any of the object's tokens are
accepted.

* security TODOs:

** size constraints on the set-vocab sequence

* implement schema.maxSize()

In newpb, schemas serve two purposes:

a) make programs safer by reducing the surprises that can appear in their
arguments (i.e. factoring out argument-checking in a useful way)

b) remove memory-consumption DoS attacks by putting an upper bound on the
memory consumed by any particular message.

Each schema has a pair of methods named maxSize() and maxDepth() which
provide this upper bound. While the schema is in effect (say, during the
receipt of a particular named argument to a remotely-invokable method), at
most X bytes and Y slicer frames will be in use before either the object is
accepted and processed or the schema notes the violation and the object is
rejected (whereupon the temporary storage is released and all further bytes
in the rejected object are simply discarded). Strictly speaking, the number
returned by maxSize() is the largest string on the wire which has not yet
been rejected as violating the constraint, but it is also a reasonable
metric to describe how much internal storage must be used while processing
it. (To achieve greater accuracy would involve knowing exactly how large
each Python type is; not a sensible thing to attempt).

The idea is that someone who is worried about an attacker throwing a really
long string or an infinitely-nested list at them can ask the schema just what
exactly their current exposure is. The tradeoff between flexibility ("accept
any object whatsoever here") and exposure to DoS attack is then user-visible
and thus user-selectable.

To implement maxSize() for a basic schema (like a string), you simply need
to look at banana.xhtml and see how basic tokens are encoded (you will also
need to look at banana.py and see how deserialization is actually
implemented). For a schema.StringConstraint(32) (which accepts strings <= 32
characters in length), the largest serialized form that has not yet been
either accepted or rejected is:

64 bytes (header indicating 0x000000..0020 with lots of leading zeros)
+ 1 byte (STRING token)
+ 32 bytes (string contents)
= 97

If the header indicates a conforming length (<=32) then just after the 32nd
byte is received, the string object is created and handed to up the stack, so
the temporary storage tops out at 97. If someone is trying to spam us with a
million-character string, the serialized form would look like:

64 bytes (header indicating 1-million in hex, with leading zeros)
+ 1 byte (STRING token)
= 65

at which point the receive parser would check the constraint, decide that
1000000 > 32, and reject the remainder of the object.

So (with the exception of pass/fail maxSize values, see below), the following
should hold true:

schema.StringConstraint(32).maxSize() == 97

Now, schemas which represent containers have size limits that are the sum of
their contents, plus some overhead (and a stack level) for the container
itself. For example, a list of two small integers is represented in newbanana
as:

OPEN(list)
INT
INT
CLOSE()

which really looks like:

opencount-OPEN
len-STRING-"list"
value-INT
value-INT
opencount-CLOSE

This sequence takes at most:

opencount-OPEN: 64+1
len-STRING-"list": 64+1+1000 (opentypes are confined to be <= 1k long)
value-INT: 64+1
value-INT: 64+1
opencount-CLOSE: 64+1

or 5*(64+1)+1000 = 1325, or rather:

3*(64+1)+1000 + N*(IntConstraint().maxSize())

So ListConstraint.maxSize is computed by doing some math involving the
.maxSize value of the objects that go into it (the ListConstraint.constraint
attribute). This suggests a recursive algorithm. If any constraint is
unbounded (say a ListConstraint with no limit on the length of the list),
then maxSize() raises UnboundedSchema to indicate that there is no limit on
the size of a conforming string. Clearly, if any constraint is found to
include itself, UnboundedSchema must also be raised.

This is a loose upper bound. For example, one non-conforming input string
would be:

opencount-OPEN: 64+1
len-STRING-"x"*1000: 64+1+1000

The entire string would be accepted before checking to see which opentypes
were valid: the ListConstraint only accepts the "list" opentype and would
reject this string immediately after the 1000th "x" was received. So a
tighter upper bound would be 2*65+1000 = 1130.

In general, the bound is computed by walking through the deserialization
process and identifying the largest string that could make it past the
validity checks. There may be later checks that will reject the string, but
if it has not yet been rejected, then it still represents exposure for a
memory consumption DoS.

** pass/fail sizes

I started to think that it was necessary to have each constraint provide two
maxSize numbers: one of the largest sequence that could possibly be accepted
as valid, and a second which was the largest sequence that could be still
undecided. This would provide a more accurate upper bound because most
containers will respond to an invalid object by abandoning the rest of the
container: i.e. if the current active constraint is:

ListConstraint(StringConstraint(32), maxLength=30)

then the first thing that doesn't match the string constraint (say an
instance, or a number, or a 33-character string) will cause the ListUnslicer
to go into discard-everything mode. This makes a significant difference when
the per-item constraint allows opentypes, because the OPEN type (a string) is
constrained to 1k bytes. The item constraint probably imposes a much smaller
limit on the set of actual strings that would be accepted, so no
kilobyte-long opentype will possibly make it past that constraint. That means
there can only be one outstanding invalid object. So the worst case (maximal
length) string that has not yet been rejected would be something like:

OPEN(list)
validthing [0]
validthing [1]
...
validthing [n-1]
long-invalid-thing

because if the long-invalid thing had been received earlier, the entire list
would have been abandoned.

This suggests that the calculation for ListConstraint.maxSize() really needs
to be like
overhead
+(len-1)*itemConstraint.maxSize(valid)
+(1)*itemConstraint.maxSize(invalid)

I'm still not sure about this. I think it provides a significantly tighter
upper bound. The deserialization process itself does not try to achieve the
absolute minimal exposure (i.e., the opentype checker could take the set of
all known-valid open types, compute the maximum length, and then impose a
StringConstraint with that length instead of 1000), because it is, in
general, a inefficient hassle. There is a tradeoff between computational
efficiency and removing the slack in the maxSize bound, both in the
deserialization process (where the memory is actually consumed) and in
maxSize (where we estimate how much memory could be consumed).

Anyway, maxSize() and maxDepth() (which is easier: containers add 1 to the
maximum of the maxDepth values of their possible children) need to be
implemented for all the Constraint classes. There are some tests (disabled)
in test_schema.py for this code: those tests assert specific values for
maxSize. Those values are probably wrong, so they must be updated to match
however maxSize actually works.

* decide upon what the "Shared" constraint should mean

The idea of this one was to avoid some vulnerabilities by rejecting arbitrary
Expand Down
85 changes: 0 additions & 85 deletions foolscap/constraint.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,6 @@
}
nothingTaster = {}

class UnboundedSchema(Exception):
pass

class IConstraint(Interface):
pass
class IRemoteMethodConstraint(IConstraint):
Expand Down Expand Up @@ -177,40 +174,6 @@ def checkObject(self, obj, inbound):
# this default form passes everything
return

def maxSize(self, seen=None):
"""
I help a caller determine how much memory could be consumed by the
input stream while my constraint is in effect.
My constraint will be enforced against the bytes that arrive over
the wire. Eventually I will either accept the incoming bytes and my
Unslicer will provide an object to its parent (including any
subobjects), or I will raise a Violation exception which will kick
my Unslicer into 'discard' mode.
I define maxSizeAccept as the maximum number of bytes that will be
received before the stream is accepted as valid. maxSizeReject is
the maximum that will be received before a Violation is raised. The
max of the two provides an upper bound on single objects. For
container objects, the upper bound is probably (n-1)*accept +
reject, because there can only be one outstanding
about-to-be-rejected object at any time.
I return (maxSizeAccept, maxSizeReject).
I raise an UnboundedSchema exception if there is no bound.
"""
raise UnboundedSchema

def maxDepth(self):
"""I return the greatest number Slicer objects that might exist on
the SlicerStack (or Unslicers on the UnslicerStack) while processing
an object which conforms to this constraint. This is effectively the
maximum depth of the object tree. I raise UnboundedSchema if there is
no bound.
"""
raise UnboundedSchema

COUNTERBYTES = 64 # max size of opencount

def OPENBYTES(self, dummy):
Expand Down Expand Up @@ -265,13 +228,6 @@ def checkObject(self, obj, inbound):
if not self.regexp.search(obj):
raise Violation("regexp failed to match")

def maxSize(self, seen=None):
if self.maxLength == None:
raise UnboundedSchema
return 64+1+self.maxLength
def maxDepth(self, seen=None):
return 1

class IntegerConstraint(Constraint):
opentypes = [] # redundant
# taster set in __init__
Expand All @@ -297,15 +253,6 @@ def checkObject(self, obj, inbound):
if abs(obj) >= 2**(8*self.maxBytes):
raise Violation("number too large")

def maxSize(self, seen=None):
if self.maxBytes == None:
raise UnboundedSchema
if self.maxBytes == -1:
return 64+1
return 64+1+self.maxBytes
def maxDepth(self, seen=None):
return 1

class NumberConstraint(IntegerConstraint):
"""I accept floats, ints, and longs."""
name = "NumberConstraint"
Expand All @@ -320,14 +267,6 @@ def checkObject(self, obj, inbound):
return
IntegerConstraint.checkObject(self, obj, inbound)

def maxSize(self, seen=None):
# floats are packed into 8 bytes, so the shortest FLOAT token is
# 64+1+8
intsize = IntegerConstraint.maxSize(self, seen)
return max(64+1+8, intsize)
def maxDepth(self, seen=None):
return 1



#TODO
Expand All @@ -337,18 +276,6 @@ class Shared(Constraint):
def __init__(self, constraint, refLimit=None):
self.constraint = IConstraint(constraint)
self.refLimit = refLimit
def maxSize(self, seen=None):
if not seen: seen = []
if self in seen:
raise UnboundedSchema # recursion
seen.append(self)
return self.constraint.maxSize(seen)
def maxDepth(self, seen=None):
if not seen: seen = []
if self in seen:
raise UnboundedSchema # recursion
seen.append(self)
return self.constraint.maxDepth(seen)

#TODO: might be better implemented with a .optional flag
class Optional(Constraint):
Expand All @@ -357,15 +284,3 @@ class Optional(Constraint):
def __init__(self, constraint, default):
self.constraint = IConstraint(constraint)
self.default = default
def maxSize(self, seen=None):
if not seen: seen = []
if self in seen:
raise UnboundedSchema # recursion
seen.append(self)
return self.constraint.maxSize(seen)
def maxDepth(self, seen=None):
if not seen: seen = []
if self in seen:
raise UnboundedSchema # recursion
seen.append(self)
return self.constraint.maxDepth(seen)
23 changes: 1 addition & 22 deletions foolscap/copyable.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
import slicer, tokens
from tokens import BananaError, Violation
from foolscap.constraint import OpenerConstraint, IConstraint, \
ByteStringConstraint, UnboundedSchema, Optional
ByteStringConstraint, Optional

Interface = interface.Interface

Expand Down Expand Up @@ -373,27 +373,6 @@ def __init__(self, *attrTuples, **kwargs):
assert name not in self.keys.keys()
self.keys[name] = IConstraint(constraint)

def maxSize(self, seen=None):
if not seen: seen = []
if self in seen:
raise UnboundedSchema # recursion
seen.append(self)
total = self.OPENBYTES("attributedict")
for name, constraint in self.keys.iteritems():
total += ByteStringConstraint(len(name)).maxSize(seen)
total += constraint.maxSize(seen[:])
return total

def maxDepth(self, seen=None):
if not seen: seen = []
if self in seen:
raise UnboundedSchema # recursion
seen.append(self)
# all the attribute names are 1-deep, so the min depth of the dict
# items is 1. The other "1" is for the AttributeDict container itself
return 1 + reduce(max, [c.maxDepth(seen[:])
for c in self.itervalues()], 1)

def getAttrConstraint(self, attrname):
c = self.keys.get(attrname)
if c:
Expand Down
16 changes: 1 addition & 15 deletions foolscap/remoteinterface.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import inspect
from zope.interface import interface, providedBy, implements
from foolscap.constraint import Constraint, OpenerConstraint, nothingTaster, \
IConstraint, UnboundedSchema, IRemoteMethodConstraint, Optional, Any
IConstraint, IRemoteMethodConstraint, Optional, Any
from foolscap.tokens import Violation, InvalidRemoteInterface
from foolscap.schema import addToConstraintTypeMap
from foolscap import ipb
Expand Down Expand Up @@ -296,20 +296,6 @@ def checkResults(self, results, inbound):
# location appropriately: they have more information than we do.
self.responseConstraint.checkObject(results, inbound)

def maxSize(self, seen=None):
if self.acceptUnknown:
raise UnboundedSchema # there is no limit on that thing
if self.ignoreUnknown:
# for now, we ignore unknown arguments by accepting the object
# and then throwing it away. This makes us vulnerable to the
# memory consumed by that object. TODO: in the CallUnslicer,
# arrange to discard the ignored object instead of receiving it.
# When this is done, ignoreUnknown will not cause the schema to
# be unbounded and this clause should be removed.
raise UnboundedSchema
# TODO: implement the rest of maxSize, just like a dictionary
raise NotImplementedError

class UnconstrainedMethod(object):
"""I am a method constraint that accepts any arguments and any return
value.
Expand Down
Loading

0 comments on commit 6abdbc0

Please sign in to comment.