# Hypothesis: Thorough Unit-Testing You Can Trust

Xavier Villaneau

## A Friendly Plug

![PyTennessee 2019 logo](pytn_logo_black.png)

PyTennessee 2019 was AWESOME

## Once upon a time at a Haskell workshop...

```haskell
dispatch :: Command -> Connection -> IO String
dispatch ListFeatures conn = printManyReply <$> listFeatures conn
dispatch _ _ = return "Not implemented"

main :: IO ()
main = do
  cmd <- cmdArgs cmdParser
  conn <- checkedConnect defaultConnectInfo
  Prelude.putStr =<< dispatch cmd conn

```

“I must write unit tests for every line of my code, or my Honor as a programmer will be tarnished.” -- _Former me_

## ...I met QuickCheck

> “**QuickCheck** is a tool which aids the Haskell programmer in formulating and testing properties of programs. \[...\]   
> We have designed a \[language\] which the tester uses to define expected properties of the functions under test. **QuickCheck** then checks that the properties hold in a large number of cases.”

K. Claessen and J. Hugues,  
QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs.  
In *International Conference on Functional Programming*, September 2000.

## Hypothesis, Python's QuickCheck

#### 1. Introducing Hypothesis With an Example
#### 2. The Hypothesis Manual, Abridged
#### 3. Strategic Overview
#### 4. The Bonus Features

## 1. Introducing Hypothesis With an Example

### Hello, I'm “The Function”

In [1]:
def extract_id(str_id: str) -> int:
    if str_id[0] == 'c' and str_id[1:].isdigit():
        return int(str_id[1:])
    return -1

Let's try it out:

In [2]:
extract_id('c12345')

12345

In [3]:
extract_id('csx8888')

-1

### Hello Function, I'm Hypothesis

In [4]:
from hypothesis import settings, Verbosity
settings.register_profile('demo', verbosity=Verbosity.verbose, max_examples=10)
settings.load_profile('demo')

In [5]:
from hypothesis import given, strategies

@given(strategies.integers(min_value=0))
def test_int_always_decodes(number):
    assert extract_id(f'c{number}') == number

In [32]:
test_int_always_decodes()

Trying example: test_int_always_decodes(number=0)
Trying example: test_int_always_decodes(number=29718)
Trying example: test_int_always_decodes(number=6695635479773085624)
Trying example: test_int_always_decodes(number=19)
Trying example: test_int_always_decodes(number=18969)
Trying example: test_int_always_decodes(number=52167998556186279088842865185018497561)
Trying example: test_int_always_decodes(number=112667830879452319647840124170581575183)
Trying example: test_int_always_decodes(number=23594)
Trying example: test_int_always_decodes(number=24431)
Trying example: test_int_always_decodes(number=86)


### A harder test

In [7]:
from hypothesis import assume

@given(strategies.characters(), strategies.integers(min_value=0))
def test_not_c_never_decodes(char, number):
    assume(char != 'c')
    assert extract_id(f'{char}{number}') == -1

In [8]:
test_not_c_never_decodes()

Trying example: test_not_c_never_decodes(char='0', number=0)
Trying example: test_not_c_never_decodes(char='𦆔', number=17503)
Trying example: test_not_c_never_decodes(char='\U0003d012', number=14917)
Trying example: test_not_c_never_decodes(char='\U00097f59', number=13962)
Trying example: test_not_c_never_decodes(char='\U000bdc3b', number=1404012614)
Trying example: test_not_c_never_decodes(char='𦶢', number=62)
Trying example: test_not_c_never_decodes(char='\U000b71e4', number=22570)
Trying example: test_not_c_never_decodes(char='/', number=4863)
Trying example: test_not_c_never_decodes(char='\x07', number=2025010267)
Trying example: test_not_c_never_decodes(char='\t', number=24607)


### This works like a charm, right?

In [9]:
import contextlib
settings.load_profile('default')

In [10]:
@given(strategies.text())
def test_never_crashes(any_string):
    assert isinstance(extract_id(any_string), int)

In [11]:
with contextlib.suppress(Exception):
    test_never_crashes()

Falsifying example: test_never_crashes(any_string='')


Reminder: our `str_id[0]` might be at fault
```python
def extract_id(str_id: str) -> int:
    if str_id[0] == 'c' and str_id[1:].isdigit():
        return int(str_id[1:])
    return -1
```

### Y'all got anymore bugs?

In [12]:
import re
settings.register_profile('try_a_lot', max_examples=10_000)
settings.load_profile('try_a_lot')

In [13]:
from hypothesis import assume
RE_NUMBER = re.compile(r'^[0-9]+$')

@given(strategies.text())
def test_no_false_positives(not_a_number):
    assume(RE_NUMBER.match(not_a_number) is None)
    assert extract_id(f'c{not_a_number}') == -1

In [14]:
with contextlib.suppress(Exception):
    test_no_false_positives()

Falsifying example: test_no_false_positives(not_a_number='❶')


In [15]:
settings.load_profile('default')

### wut.

In [16]:
for c in ('7', '\u06f3', '\u2792', '\xbd','\u56db', '\u0b87'):
    print(c, c.isdecimal(), c.isdigit(), c.isnumeric(), c.isalnum(), sep='\t')

7	True	True	True	True
۳	True	True	True	True
➒	False	True	True	True
½	False	False	True	True
四	False	False	True	True
இ	False	False	False	True


𝕿𝖍𝖔𝖚 𝖍𝖆𝖘𝖙 𝖜𝖆𝖐𝖊𝖙𝖍 𝖙𝖍𝖊 𝖜𝖗𝖆𝖙𝖍 𝖔𝖋 𝖀𝖓𝖎𝖈𝖔𝖉𝖊

## 2. The Hypothesis Manual, Abridged

### Back to the basics

* `@given()` → Decorate a test to run Hypothesis
* `strategies` → Data generators module

```python
@given(strategy)
def test_function(arguments):
    assert test_condition(arguments)
```

### Skipping tests

* `assume(condition)` → Ignore test if `condition` is false
* Can also use `Strategy.filter()`

**Warning:**  
If too many attempts are invalidated, Hypothesis eventually gives up.

### Forcing a test input

* `@example(*arguments)` → Forces a specific test input

This is _in addition to_ the randomly generated examples.  
Useful for systematically testing a known failing input or corner case.

```python
@given(strategies.integers(), strategies.integers())
@example(1000, -1000)
def test_addition_commutative(x, y):
    assert x + y == y + x
```

In [17]:
settings.load_profile('demo')

In [18]:
from hypothesis import example

@given(strategies.integers(), strategies.integers())
@example(71, 212)
@example(1, -1)
def test_addition_commutative(x, y):
    assert x + y == y + x

In [19]:
test_addition_commutative()

Trying example: test_addition_commutative(x=71, y=212)
Trying example: test_addition_commutative(x=1, y=-1)
Trying example: test_addition_commutative(x=0, y=0)
Trying example: test_addition_commutative(x=2264, y=-20525)
Trying example: test_addition_commutative(x=105, y=-48)
Trying example: test_addition_commutative(x=2873, y=92)
Trying example: test_addition_commutative(x=-59, y=-50)
Trying example: test_addition_commutative(x=-6961, y=58)
Trying example: test_addition_commutative(x=108, y=-103497676963079845226034321549979989840)
Trying example: test_addition_commutative(x=4080, y=1785)
Trying example: test_addition_commutative(x=-25853, y=-27057)
Trying example: test_addition_commutative(x=-13752, y=1181)


In [20]:
settings.load_profile('default')

### The Magic of Failure

When an input fails, Hypothesis **shrinks** it to the simplest case.

Hypothesis remembers failing input in its **database** and tries it again next time.

### What to test?

Look for *invariant* properties:

1. Does the function *crash* when it shouldn't? (e.g. validators)
2. Does the inverse function return the same input? (e.g. parsers, serializers)
3. Is the function *idempotent*?
4. Does the function match a known other function? (optimization, refactor)

## 3. Strategic Overview

### The simple stuff

In [21]:
def gimme_examples(strategy, n=5):
    for _ in range(n):
        print(strategy.example())

In [22]:
gimme_examples(strategies.floats())

9007199254740992.0
-inf
nan
-0.3333333333333333
9007199254740992.0


In [36]:
gimme_examples(strategies.text().map(str.encode))

b'\xf3\xb4\xb9\x81\xf3\x9d\x97\xa1"\x1e+'
b''
b'\xe8\xba\xab\xf4\x83\x8c\xae\x11\xf1\xa2\x94\xaa\xf1\x8f\x94\x94\xf0\x99\x90\x8b\xf2\x98\xbc\xb2\r\xeb\xaf\x90'
b'\r\x14+'
b'\x050\x00'


### Collections

In [38]:
gimme_examples(strategies.lists(strategies.integers()))

[19031, 25378, -7319435685091632798, 6127, -4365, -8657, -29071]
[-13208, -20178]
[-196145342]
[]
[30, -27607, 21790, -557858677168247956, -21788, -27406, -8816, 19714, 4678, 15, -1894378146403338185, -10141]


Also: `sets`, `dictionaries`, `tuples`, `iterables`, `frozensets`

### Building objects

In [25]:
from decimal import Decimal
from dataclasses import dataclass
from string import ascii_letters

In [26]:
@dataclass
class Customer:
    username: str
    customer_id: int
    account_balance: Decimal

In [27]:
gimme_examples(strategies.builds(Customer))

Customer(username='\U0005813e(', customer_id=-985602706, account_balance=Decimal('-0.704'))
Customer(username='\x0b', customer_id=1068358331, account_balance=Decimal('0.6468571'))
Customer(username='塀\x05\U000ba92d\U00012dd1\U000f166e\x1f\n\U000f438b', customer_id=-1019875701, account_balance=Decimal('-Infinity'))
Customer(username='\U00013e7e-\U00044b77', customer_id=-54, account_balance=Decimal('NaN'))
Customer(username='#', customer_id=-25903, account_balance=Decimal('-Infinity'))


### Composite strategies

In [28]:
@strategies.composite
def char_ids(draw):
    prefix = draw(strategies.sampled_from(ascii_letters))
    number = draw(strategies.integers(min_value=0))
    return f'{prefix}/{number}'

In [29]:
gimme_examples(char_ids())

y/115664219
h/26745
m/438350873
m/22055
M/58


### ...and much more!

Dates and time, composition tools, recursive data structures, functional tools...

https://hypothesis.readthedocs.io/en/latest/data.html

## 4. The Bonus Features

### Django support

* Must use `hypothesis.extra.django.TestCase` instead of `django.test.TestCase`.
* Automatic DB model creation strategy: `hypothesis.extra.django.from_model`.

```python
from hypothesis import given
from hypothesis.extra.django import TestCase, from_model

class CustomerRecordsTest(TestCase):
    @given(from_model(CustomerRecord))
    def test_customer_record(record):
        # Relevant test with DB usage
```

### Settings profiles

Some hidden code from earlier in this presentation:
```python
from hypothesis import settings, Verbosity
settings.register_profile('demo', verbosity=Verbosity.verbose, max_examples=10)
settings.load_profile('demo')
```

**Usage example:** run local tests with few examples, run CI tests with many

### pytest support

Running hypothesis tests through `pytest` exposes CLI options to:
* Set the test seed
* Set the verbosity
* Set a custom profile:  
      pytest --hypothesis-profile=ci_tests .
* Collect runtime statistics:
      pytest --hypothesis-show-statistics .

**Fact:** you should be using `pytest`

```
test_extract_id.py::test_int_always_decodes:
  - 100 passing examples, 0 failing examples, 0 invalid examples
  - Typical runtimes: < 1ms
  - Fraction of time spent in data generation: ~ 47%
  - Stopped because settings.max_examples=100

test_extract_id.py::test_not_c_never_decodes:
  - 100 passing examples, 0 failing examples, 0 invalid examples
  - Typical runtimes: < 1ms
  - Fraction of time spent in data generation: ~ 63%
  - Stopped because settings.max_examples=100

test_extract_id.py::test_never_crashes:
  - 0 passing examples, 2 failing examples, 0 invalid examples
  - Typical runtimes: 0-15 ms
  - Fraction of time spent in data generation: ~ 1%
  - Stopped because nothing left to do

test_extract_id.py::test_no_false_positives:
  - 8 passing examples, 8 failing examples, 5 invalid examples
  - Typical runtimes: < 1ms
  - Fraction of time spent in data generation: ~ 35%
  - Stopped because nothing left to do
```

### Stateful testing

> With Hypothesis’s stateful testing, \[it\] tries to generate not just data but entire tests. You specify a number of primitive actions that can be combined together, and then **Hypothesis will try to find sequences of those actions that result in a failure**.

Useful for testing more complex system, e.g. database APIs.

Haven't used it yet myself, so this counts as homework:  
https://hypothesis.readthedocs.io/en/latest/stateful.html

## Conclusion

### Conclusion

Hypothesis is good at:

* Finding the edge case bugs you forgot
* Testing mission-critical logic that **must** be reliable
* Testing bidirectional functions (e.g. parsers, serializers)
* Fuzzy testing

However:

* Tests **must** be invariant
* Tests should be fast
* It's bad at generating "real" data (use `faker`)
* It should not replace sanity/smoke tests

# Thank you!

Slides on:  
https://github.com/xvillaneau/talks

### Further reading

David R. MacIver, The Purpose of Hypothesis  
https://hypothesis.readthedocs.io/en/latest/manifesto.html

Scott W., Choosing properties for property-based testing  
https://fsharpforfunandprofit.com/posts/property-based-testing-2/

Joe "begriffs" Nelson, The Design and Use of QuickCheck  
https://begriffs.com/posts/2017-01-14-design-use-quickcheck.html