remove maxSize/maxDepth/UnboundedSchema

warner · Mar 15, 2010 · 6abdbc0 · 6abdbc0
1 parent 42985fb
commit 6abdbc0
Show file tree

Hide file tree

Showing 14 changed files with 24 additions and 490 deletions.
diff --git a/ChangeLog b/ChangeLog
@@ -1,5 +1,14 @@
 2010-03-14  Brian Warner  <warner@lothar.com>
 
+	* foolscap/constraint.py: remove maxSize/maxDepth methods, and the
+	related UnboundedSchema exception. As described in ticket #127,
+	I'm giving up on resource-exhaustion defenses, which allows for a
+	lot of code simplification.
+	* foolscap/{copyable.py|removeinterface.py|schema.py}: same
+	* foolscap/slicers/*.py: same
+	* foolscap/test/test_schema.py: remove tests
+	* doc/jobs.txt: remove TODO items around maxSize
+
 	* foolscap/_version.py: bump version while between releases
 	* misc/*/debian/changelog: same
 

diff --git a/doc/jobs.txt b/doc/jobs.txt
@@ -156,177 +156,6 @@ name must be looked up in the unjellyableRegistry (and the class located, and
 the __implements__ list consulted) before any of the object's tokens are
 accepted.
 
-* security TODOs:
-
-** size constraints on the set-vocab sequence
-
-* implement schema.maxSize()
-
-In newpb, schemas serve two purposes:
-
- a) make programs safer by reducing the surprises that can appear in their
-    arguments (i.e. factoring out argument-checking in a useful way)
-
- b) remove memory-consumption DoS attacks by putting an upper bound on the
-    memory consumed by any particular message.
-
-Each schema has a pair of methods named maxSize() and maxDepth() which
-provide this upper bound. While the schema is in effect (say, during the
-receipt of a particular named argument to a remotely-invokable method), at
-most X bytes and Y slicer frames will be in use before either the object is
-accepted and processed or the schema notes the violation and the object is
-rejected (whereupon the temporary storage is released and all further bytes
-in the rejected object are simply discarded). Strictly speaking, the number
-returned by maxSize() is the largest string on the wire which has not yet
-been rejected as violating the constraint, but it is also a reasonable
-metric to describe how much internal storage must be used while processing
-it. (To achieve greater accuracy would involve knowing exactly how large
-each Python type is; not a sensible thing to attempt).
-
-The idea is that someone who is worried about an attacker throwing a really
-long string or an infinitely-nested list at them can ask the schema just what
-exactly their current exposure is. The tradeoff between flexibility ("accept
-any object whatsoever here") and exposure to DoS attack is then user-visible
-and thus user-selectable.
-
-To implement maxSize() for a basic schema (like a string), you simply need
-to look at banana.xhtml and see how basic tokens are encoded (you will also
-need to look at banana.py and see how deserialization is actually
-implemented). For a schema.StringConstraint(32) (which accepts strings <= 32
-characters in length), the largest serialized form that has not yet been
-either accepted or rejected is:
-
-  64 bytes (header indicating 0x000000..0020 with lots of leading zeros)
- + 1 byte (STRING token)
- + 32 bytes (string contents)
- = 97
-
-If the header indicates a conforming length (<=32) then just after the 32nd
-byte is received, the string object is created and handed to up the stack, so
-the temporary storage tops out at 97. If someone is trying to spam us with a
-million-character string, the serialized form would look like:
-
-  64 bytes (header indicating 1-million in hex, with leading zeros)
-+  1 byte (STRING token)
-= 65
-
-at which point the receive parser would check the constraint, decide that
-1000000 > 32, and reject the remainder of the object.
-
-So (with the exception of pass/fail maxSize values, see below), the following
-should hold true:
-
- schema.StringConstraint(32).maxSize() == 97
-
-Now, schemas which represent containers have size limits that are the sum of
-their contents, plus some overhead (and a stack level) for the container
-itself. For example, a list of two small integers is represented in newbanana
-as:
-
- OPEN(list)
-  INT
-  INT
- CLOSE()
-
-which really looks like:
-
- opencount-OPEN
-  len-STRING-"list"
-  value-INT
-  value-INT
- opencount-CLOSE
-
-This sequence takes at most:
-
- opencount-OPEN: 64+1
- len-STRING-"list": 64+1+1000  (opentypes are confined to be <= 1k long)
- value-INT: 64+1
- value-INT: 64+1
- opencount-CLOSE: 64+1
-
-or 5*(64+1)+1000 = 1325, or rather:
-
-  3*(64+1)+1000 + N*(IntConstraint().maxSize())
-
-So ListConstraint.maxSize is computed by doing some math involving the
-.maxSize value of the objects that go into it (the ListConstraint.constraint
-attribute). This suggests a recursive algorithm. If any constraint is
-unbounded (say a ListConstraint with no limit on the length of the list),
-then maxSize() raises UnboundedSchema to indicate that there is no limit on
-the size of a conforming string. Clearly, if any constraint is found to
-include itself, UnboundedSchema must also be raised.
-
-This is a loose upper bound. For example, one non-conforming input string
-would be:
-
- opencount-OPEN: 64+1
- len-STRING-"x"*1000: 64+1+1000
-
-The entire string would be accepted before checking to see which opentypes
-were valid: the ListConstraint only accepts the "list" opentype and would
-reject this string immediately after the 1000th "x" was received. So a
-tighter upper bound would be 2*65+1000 = 1130.
-
-In general, the bound is computed by walking through the deserialization
-process and identifying the largest string that could make it past the
-validity checks. There may be later checks that will reject the string, but
-if it has not yet been rejected, then it still represents exposure for a
-memory consumption DoS.
-
-** pass/fail sizes
-
-I started to think that it was necessary to have each constraint provide two
-maxSize numbers: one of the largest sequence that could possibly be accepted
-as valid, and a second which was the largest sequence that could be still
-undecided. This would provide a more accurate upper bound because most
-containers will respond to an invalid object by abandoning the rest of the
-container: i.e. if the current active constraint is:
-
- ListConstraint(StringConstraint(32), maxLength=30)
-
-then the first thing that doesn't match the string constraint (say an
-instance, or a number, or a 33-character string) will cause the ListUnslicer
-to go into discard-everything mode. This makes a significant difference when
-the per-item constraint allows opentypes, because the OPEN type (a string) is
-constrained to 1k bytes. The item constraint probably imposes a much smaller
-limit on the set of actual strings that would be accepted, so no
-kilobyte-long opentype will possibly make it past that constraint. That means
-there can only be one outstanding invalid object. So the worst case (maximal
-length) string that has not yet been rejected would be something like:
-
-  OPEN(list)
-   validthing [0]
-   validthing [1]
-    ...
-   validthing [n-1]
-   long-invalid-thing
-
-because if the long-invalid thing had been received earlier, the entire list
-would have been abandoned.
-
-This suggests that the calculation for ListConstraint.maxSize() really needs
-to be like
-  overhead
-  +(len-1)*itemConstraint.maxSize(valid)
-  +(1)*itemConstraint.maxSize(invalid)
-
-I'm still not sure about this. I think it provides a significantly tighter
-upper bound. The deserialization process itself does not try to achieve the
-absolute minimal exposure (i.e., the opentype checker could take the set of
-all known-valid open types, compute the maximum length, and then impose a
-StringConstraint with that length instead of 1000), because it is, in
-general, a inefficient hassle. There is a tradeoff between computational
-efficiency and removing the slack in the maxSize bound, both in the
-deserialization process (where the memory is actually consumed) and in
-maxSize (where we estimate how much memory could be consumed).
-
-Anyway, maxSize() and maxDepth() (which is easier: containers add 1 to the
-maximum of the maxDepth values of their possible children) need to be
-implemented for all the Constraint classes. There are some tests (disabled)
-in test_schema.py for this code: those tests assert specific values for
-maxSize. Those values are probably wrong, so they must be updated to match
-however maxSize actually works.
-
 * decide upon what the "Shared" constraint should mean
 
 The idea of this one was to avoid some vulnerabilities by rejecting arbitrary

diff --git a/foolscap/constraint.py b/foolscap/constraint.py
@@ -29,9 +29,6 @@
     }
 nothingTaster = {}
 
-class UnboundedSchema(Exception):
-    pass
-
 class IConstraint(Interface):
     pass
 class IRemoteMethodConstraint(IConstraint):
@@ -177,40 +174,6 @@ def checkObject(self, obj, inbound):
         # this default form passes everything
         return
 
-    def maxSize(self, seen=None):
-        """
-        I help a caller determine how much memory could be consumed by the
-        input stream while my constraint is in effect.
-
-        My constraint will be enforced against the bytes that arrive over
-        the wire. Eventually I will either accept the incoming bytes and my
-        Unslicer will provide an object to its parent (including any
-        subobjects), or I will raise a Violation exception which will kick
-        my Unslicer into 'discard' mode.
-
-        I define maxSizeAccept as the maximum number of bytes that will be
-        received before the stream is accepted as valid. maxSizeReject is
-        the maximum that will be received before a Violation is raised. The
-        max of the two provides an upper bound on single objects. For
-        container objects, the upper bound is probably (n-1)*accept +
-        reject, because there can only be one outstanding
-        about-to-be-rejected object at any time.
-
-        I return (maxSizeAccept, maxSizeReject).
-
-        I raise an UnboundedSchema exception if there is no bound.
-        """
-        raise UnboundedSchema
-
-    def maxDepth(self):
-        """I return the greatest number Slicer objects that might exist on
-        the SlicerStack (or Unslicers on the UnslicerStack) while processing
-        an object which conforms to this constraint. This is effectively the
-        maximum depth of the object tree. I raise UnboundedSchema if there is
-        no bound.
-        """
-        raise UnboundedSchema
-
     COUNTERBYTES = 64 # max size of opencount
 
     def OPENBYTES(self, dummy):
@@ -265,13 +228,6 @@ def checkObject(self, obj, inbound):
             if not self.regexp.search(obj):
                 raise Violation("regexp failed to match")
 
-    def maxSize(self, seen=None):
-        if self.maxLength == None:
-            raise UnboundedSchema
-        return 64+1+self.maxLength
-    def maxDepth(self, seen=None):
-        return 1
-
 class IntegerConstraint(Constraint):
     opentypes = [] # redundant
     # taster set in __init__
@@ -297,15 +253,6 @@ def checkObject(self, obj, inbound):
             if abs(obj) >= 2**(8*self.maxBytes):
                 raise Violation("number too large")
 
-    def maxSize(self, seen=None):
-        if self.maxBytes == None:
-            raise UnboundedSchema
-        if self.maxBytes == -1:
-            return 64+1
-        return 64+1+self.maxBytes
-    def maxDepth(self, seen=None):
-        return 1
-
 class NumberConstraint(IntegerConstraint):
     """I accept floats, ints, and longs."""
     name = "NumberConstraint"
@@ -320,14 +267,6 @@ def checkObject(self, obj, inbound):
             return
         IntegerConstraint.checkObject(self, obj, inbound)
 
-    def maxSize(self, seen=None):
-        # floats are packed into 8 bytes, so the shortest FLOAT token is
-        # 64+1+8
-        intsize = IntegerConstraint.maxSize(self, seen)
-        return max(64+1+8, intsize)
-    def maxDepth(self, seen=None):
-        return 1
-
 
 
 #TODO
@@ -337,18 +276,6 @@ class Shared(Constraint):
     def __init__(self, constraint, refLimit=None):
         self.constraint = IConstraint(constraint)
         self.refLimit = refLimit
-    def maxSize(self, seen=None):
-        if not seen: seen = []
-        if self in seen:
-            raise UnboundedSchema # recursion
-        seen.append(self)
-        return self.constraint.maxSize(seen)
-    def maxDepth(self, seen=None):
-        if not seen: seen = []
-        if self in seen:
-            raise UnboundedSchema # recursion
-        seen.append(self)
-        return self.constraint.maxDepth(seen)
 
 #TODO: might be better implemented with a .optional flag
 class Optional(Constraint):
@@ -357,15 +284,3 @@ class Optional(Constraint):
     def __init__(self, constraint, default):
         self.constraint = IConstraint(constraint)
         self.default = default
-    def maxSize(self, seen=None):
-        if not seen: seen = []
-        if self in seen:
-            raise UnboundedSchema # recursion
-        seen.append(self)
-        return self.constraint.maxSize(seen)
-    def maxDepth(self, seen=None):
-        if not seen: seen = []
-        if self in seen:
-            raise UnboundedSchema # recursion
-        seen.append(self)
-        return self.constraint.maxDepth(seen)
diff --git a/foolscap/copyable.py b/foolscap/copyable.py
@@ -10,7 +10,7 @@
 import slicer, tokens
 from tokens import BananaError, Violation
 from foolscap.constraint import OpenerConstraint, IConstraint, \
-     ByteStringConstraint, UnboundedSchema, Optional
+     ByteStringConstraint, Optional
 
 Interface = interface.Interface
 
@@ -373,27 +373,6 @@ def __init__(self, *attrTuples, **kwargs):
             assert name not in self.keys.keys()
             self.keys[name] = IConstraint(constraint)
 
-    def maxSize(self, seen=None):
-        if not seen: seen = []
-        if self in seen:
-            raise UnboundedSchema # recursion
-        seen.append(self)
-        total = self.OPENBYTES("attributedict")
-        for name, constraint in self.keys.iteritems():
-            total += ByteStringConstraint(len(name)).maxSize(seen)
-            total += constraint.maxSize(seen[:])
-        return total
-
-    def maxDepth(self, seen=None):
-        if not seen: seen = []
-        if self in seen:
-            raise UnboundedSchema # recursion
-        seen.append(self)
-        # all the attribute names are 1-deep, so the min depth of the dict
-        # items is 1. The other "1" is for the AttributeDict container itself
-        return 1 + reduce(max, [c.maxDepth(seen[:])
-                                for c in self.itervalues()], 1)
-
     def getAttrConstraint(self, attrname):
         c = self.keys.get(attrname)
         if c:

diff --git a/foolscap/remoteinterface.py b/foolscap/remoteinterface.py
@@ -3,7 +3,7 @@
 import inspect
 from zope.interface import interface, providedBy, implements
 from foolscap.constraint import Constraint, OpenerConstraint, nothingTaster, \
-     IConstraint, UnboundedSchema, IRemoteMethodConstraint, Optional, Any
+     IConstraint, IRemoteMethodConstraint, Optional, Any
 from foolscap.tokens import Violation, InvalidRemoteInterface
 from foolscap.schema import addToConstraintTypeMap
 from foolscap import ipb
@@ -296,20 +296,6 @@ def checkResults(self, results, inbound):
             # location appropriately: they have more information than we do.
             self.responseConstraint.checkObject(results, inbound)
 
-    def maxSize(self, seen=None):
-        if self.acceptUnknown:
-            raise UnboundedSchema # there is no limit on that thing
-        if self.ignoreUnknown:
-            # for now, we ignore unknown arguments by accepting the object
-            # and then throwing it away. This makes us vulnerable to the
-            # memory consumed by that object. TODO: in the CallUnslicer,
-            # arrange to discard the ignored object instead of receiving it.
-            # When this is done, ignoreUnknown will not cause the schema to
-            # be unbounded and this clause should be removed.
-            raise UnboundedSchema
-        # TODO: implement the rest of maxSize, just like a dictionary
-        raise NotImplementedError
-
 class UnconstrainedMethod(object):
     """I am a method constraint that accepts any arguments and any return
     value.