Python library for falsifying hypotheses
Python Shell
Pull request Compare This branch is 1 commit ahead, 3140 commits behind HypothesisWorks:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Hypothesis is a library for falsifying its namesake. It is inspired by libraries for testing like Quickcheck, with its most direct ancestor being ScalaCheck from which it acquired its approach to test case minimization. It is not itself a test framework, but it provides decorators for easy integration with test frameworks

The primary entry point into the library is the hypothesis.falsify method.

What does it do?

You give it a predicate and a specification for how to generate arguments to test that predicate and it gives you a counterexample.

Basic examples

In [1]: from hypothesis import falsify

In [2]: falsify(lambda x,y,z: (x + y) + z == x + (y +z), float,float,float)
Out[2]: (1.0, 1.0, 0.0387906318128606)

In [3]: falsify(lambda x: sum(x) < 100, [int])
Out[3]: ([6, 29, 65],)

In [4]: falsify(lambda x: sum(x) < 100, [int,float])
Out[4]: ([18.0, 82],)

In [5]: falsify(lambda x: "a" not in x, str)
Out[5]: ('a',)

In [6]: falsify(lambda x: "a" not in x, {str})
Out[6]: (set(['a']),)

Sometimes we ask it to falsify things that are true:

In [7]: falsify(lambda x: x + 1 == 1 + x, int)
Unfalsifiable: Unable to falsify hypothesis <function <lambda> at 0x2efb1b8>

of course sometimes we ask it to falsify things that are false but hard to find:

In [8]: falsify(lambda x: x != "I am the very model of a modern major general", str)
Unfalsifiable: Unable to falsify hypothesis <function <lambda> at 0x2efb398>

It's not magic, and when the search space is large it won't be able to do very much for hard to find examples.

You can also use it to drive tests. I've only tested it with py.test, but it has no specific dependencies on it: You just write normal tests which raise exceptions on failures and it will transform those into randomized tests.

So the following test will pass:

def test_int_addition_is_commutative(x,y):
    assert x + y == y + x

And the following will fail:

def test_str_addition_is_commutative(x,y):
    assert x + y == y + x

With an error message something like:

    x = '0', y = '1'
    def test_str_addition_is_commutative(x,y):
        assert x + y == y + x
E       assert '01' == '10'
E         - 01
E         + 10

Stateful testing

You can also use hypothesis for a more stateful style of testing, to generate sequences of operations to break your code.

Considering the following broken implementation of a set:

class BadSet:
    def __init__(self): = []

    def add(self, arg):

    def remove(self, arg):
        for i in xrange(0, len(
            if[i] == arg:

    def contains(self, arg):
        return arg in

Can we use hypothesis to demonstrate that it's broken? We can indeed!

We can put together a stateful test as follows:

class BadSetTester(StatefulTest):
    def __init__(self): = BadSet()

    def add(self,i):

    def remove(self,i):
        assert not

The @step decorator says that this method is to be used as a test step. The @requires decorator says what argument types it needs when it is (you can omit @requires if you don't need any arguments).

We can now ask hypothesis for an example of this being broken:

In [7]: BadSetTester.breaking_example()
Out[7]: (('add', 1), ('add', 1), ('remove', 1)]

What does this mean? It means that if we were to do:

x = BadSetTester()

then we would get an assertion failure. Which indeed we would because the assertion that removing results in the element no longer being in the set would now be failing.

Adding custom types

Hypothesis comes with support for a lot of common built-in types out of the box, but you may want to test over spaces that involve your own data types. The easiest way to accomplish this is to derive a SearchStrategy from an existing strategy by extending MappedSearchStrategy.

The following example defines a search strategy for Decimal. It maps int values by dividing 100, so the generated values have two digits after the decimal point.

from decimal import Decimal
from hypothesis.searchstrategy import \
    SearchStrategy, MappedSearchStrategy, strategy_for

class DecimalStrategy(MappedSearchStrategy):

    def __init__(self, strategies, descriptor, **kwargs):
        SearchStrategy.__init__(self, strategies, descriptor, **kwargs)
        self.mapped_strategy = strategies.strategy(int)

    def pack(self, x):
        return Decimal(x) / 100

    def unpack(self, x):
        return int(x * 100)

Under the hood

How does hypothesis work?

The core object of hypothesis is the SearchStrategy. It knows how to explore a state space, and has the following operations.

  • produce(size,flags). Generate a random element of the state space subject to the flags provided.
  • flags(). Return a set of flags that may be used to control the production of elements.
  • could_have_produced(element). Say whether it's plausible that this element was produced by this strategy.
  • complexity(element). Return a float saying roughly how "complex" this element is. There's no meaning attached to this except that hypothesis will try to generate elements of lower complexity.
  • simplify(element). Return a generator over a simplified versions of this element.

These satisfy the following invariants:

  • produce(size,flags) should produce a distribution with about 'size' bits of entropy.
  • Any element produced by produce must return true when passed to could_have_produced
  • Any element for which could_have_produced returns true must not throw an exception when passed to complexity or simplify
  • The expected complexity of produce(size) should be monotonic increasing in size
  • for y in simplify(x), complexity(y) <= complexity(x)
  • simplify(x) should return a sequence of unique values
  • There shold be no chain x_1, x_2, ..., x_n with x_{i+1} in simplify(x_i) and x_1 in simplify(x_n).

These are used to explore the state space. produce is called with a number of sizes and flags to generate examples that falsify the hypothesis. The lowest complexity of these examples is then taken, then repeatedly simplified until an example is found with no simplification of it falsifying the hypothesis. This is taken as the end result.

SearchStrategy objects are produced from a descriptor value (which can be anything) and a SearchStrategies object, which has user definable rules for producing strategies.

So for example you can do

In [35]: SearchStrategies().strategy((int,int,[str]))
Out[35]: TupleStrategy((int, int, [str]))

There are some reasonably complicated and subtle things you can do in terms of overriding the defined search strategies. I'm not going to go into them here because it's all a bit weird and likely to still be in flux.

Warning: This library is still very much in flux, and no release of it right now should be considered to be stable. It's emerged out of the initial hack stage, and is probably not too broken, but proceed with caution.


This version of hypothesis has been tested using Python series 2.7, 3.2, 3.3 and pypy. Builds are checked with travis: