# Lesson 2: Classes in the Wild

> _Disclaimer: These points were intended to be applied to Python, your mileage may vary._

- Overutilising or underutilising classes can lead to ruin
- Classes can be a powerful tool or an endless garden path

In [1]:
!jt -r

Reset css and font defaults in:
/home/jovyan/.jupyter/custom &
/home/jovyan/.local/share/jupyter/nbextensions


## Pros

- Can break up a complicated problem into smaller parts
- Can keep track of state
  - No need to pass parameters back and forth
  - No thread-unsafe global variables
  - Can logically initialise state and then use it
- Can organise a hierarcy of states that belong together
- Provide dot-methods for accessing properties
  - "ask, don't tell"
  
## Cons

- Can make code convoluted and hard to read
  - code spread across multiple files
  - logic for a single operation spread across multiple different parts

## Functions vs Classes

### Functions vs Methods

In python

- a **function** takes parameters, returns a value
- a **method** can be called on an object, and can access state in the object


### Some examples of saturated methods

https://github.com/pandas-dev/pandas/blob/d01561fb7b9ad337611fa38a3cfb7e8c2faec608/pandas/io/parsers/readers.py#L708-L747
https://github.com/Grasia/wiki-scripts/blob/a154b995fafe440014e28f1936367638a34c7942/wiki_dump_parser/wiki_dump_parser.py#L50-L79

## Baby steps

In [1]:
CONFIG = {
    'thing': 'a',
    'identifiier': 'b',
    'name': 'c'
}
CONFIG

{'thing': 'a', 'identifiier': 'b', 'name': 'c'}

_This isn't very safe if something goes wrong, and the IDE can't offer us any help_

In [4]:
CONFIG['identifier']

KeyError: 'identifier'

## Namedtuple

`namedtuple` == Quick 'n' dirty class!

Use when you just need to
- make sure that the correct keys/values are present
- access something a few times (safely) via a dot method rather than a dict key lookup

In [33]:
from collections import namedtuple

Config = namedtuple('config', ['thing', 'identifier', 'name'])
CONFIG = Config('a', 'b', 'c')

print(CONFIG)
print(CONFIG.thing, CONFIG.identifier)

config(thing='a', identifier='b', name='c')
a b


Now, let's try the failing example again

In [34]:
Config(**{
    'thing': 'a',
    'identifiier': 'b',
    'name': 'c'
})

TypeError: __new__() got an unexpected keyword argument 'identifiier'

### A multi-level config object

In [37]:
from collections import namedtuple

# Define what your config objects need to contain
Endpoint = namedtuple(
    'endpoints', [
        'key',
        'url',
        'timeout',
        'n_workers'
    ]
)
Endpoints = namedtuple(
    'endpoints', [
        'customers',
        'products'
    ]
)

# Initialise all configs with their values
# Could read this from a JSON file, command-line args, or define it here
# Either way, the namedtuple will ensure that the result is the same
ENDPOINTS = {
    'customers': Endpoint('customers', 'customers/search/', 200, 2),
    'products': Endpoint('products', 'products/all/search/', 100, 2)
}

for endpoint, config in ENDPOINTS.items():
    print(endpoint, config.url)

customers customers/search/
products products/all/search/


Much better!

This means that we catch the error when Config is _**initialised**_, rather than when trying to _**access**_ 'identifier' later on.

This is also useful when loading a JSON config, and you need to make sure all the key are present

In [38]:
import json
configs = [
    '{"identifier": 123, "name": "me", "thing": 123}',
    '{"identifier": 123, "name": "me", "thing": 123, "extra": 1}',
]
for raw_config in configs:
    try:
        Config(**json.loads(raw_config))
    except TypeError as e:
        print(f'caught: {e}')

caught: __new__() got an unexpected keyword argument 'extra'


- However, this doesn't give us much information when we're coding using this Config.
  - We know about the _keys_ in our IDE, but not the _values_
- The hierarchy works, but is hard to read

## Dataclasses

- Python 3.7+
- Syntactic sugar for defining an `__init__` method and instance variables
- also provides a nice `__repr__` method, and some other things

[https://docs.python.org/3/library/dataclasses.html](https://docs.python.org/3/library/dataclasses.html)

### A regular class example

In [8]:
class Obj:
    def __init__(self, a=1, b=2, c='default'):
        self.a = a
        self.b = b
        self.c = c

Obj(1)

<__main__.Obj at 0x1030faee0>

### An equivalent dataclass example

In [39]:
from dataclasses import dataclass

@dataclass
class Obj:
    a: int = 1
    b: int = 2
    c: str = 'default'

Obj('a')

Obj(a=1, b=2, c='default')

In [10]:
Obj(d=5)

TypeError: __init__() got an unexpected keyword argument 'd'

### Mypy

In [13]:
from collections import namedtuple

Config = namedtuple('Config', ['a', 'b'])

def test_nt():
    Config(1,2).a + 'a'

from dataclasses import dataclass

@dataclass
class Obj:
    a: int = 1
    b: int = 2
    c: str = 'default'

def test_dc():
    Obj(1).a + 's'

In [14]:
!mypy "/code/intermediate/lesson02_classes/my.py"

[1m[32mSuccess: no issues found in 1 source file[m


---

## An Example

- You have a collection of items, in this case ids and emails
- Need to iterate through them, collect some values, and pass them on

In [17]:
from api import API

for i, el in enumerate(API.get('customers')):
    print(i, el)
    if i >= 6:
        break

0 3247
1 7835
2 1303
3 1339
4 1041
5 None
6 None


For this exercise, we must consume a list of endpoints via key, and send them to their own file

In [18]:
for endpoint in ['customer', 'transactions']:
    if endpoint == 'customer':
        for i in API.get(endpoint+'s'):
            if i is None:
                break
            print(endpoint, '--', next(API.get(endpoint, {'cid': i})))
    elif endpoint == 'transactions':
        for i in API.get(endpoint, {'ts': 0, 'te': 5}):
            if i is None:
                break
            print(endpoint, '--', i)

customer -- dmoore@kennedy.com
customer -- dcrawford@peters.com
customer -- pkramer@gmail.com
customer -- jessicajones@cook-hicks.net
customer -- aguilarwilliam@gmail.com
transactions -- 0c8eced7-9b04-4e6a-9db3-1be07cf8e3fe
transactions -- 0d3c0683-f2c3-46db-8944-2532b1041f4f
transactions -- 9bf60330-188f-4638-a66e-e66fc6b69d0f
transactions -- f6568b75-4799-4d38-908d-3c96cc42e871
transactions -- 3a9596ac-7dd9-41ec-a9fa-8c0e55c46208


### A Guest Speaker!

In [19]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('8bZh5LMaSmE?t=330')

[https://github.com/emilybache/GildedRose-Refactoring-Kata](https://github.com/emilybache/GildedRose-Refactoring-Kata)

https://github.com/tomquirk/realestate-com-au-api/blob/8368da02a67aaf1c2fe9634f19181fb54685718d/realestate_com_au/realestate_com_au.py#L70-L118

https://softwareengineering.stackexchange.com/questions/351389/dynamic-dispatch-from-a-string-python

### What's the easiest thing to extract first?

Probably the "request" logic

In [22]:
from api import API

def request(endpoint, kwargs={}):
    for r in API.get(endpoint, kwargs):
        if r is None:
            break
        yield r

for endpoint in ['customer', 'transactions']:
    if endpoint == 'customer':
        for i, c in enumerate(request(endpoint+'s')):
            print(endpoint, i, next(request(endpoint, {'cid': c})))
    elif endpoint == 'transactions':
        for i, t in enumerate(request(endpoint, {'ts': 0, 'te': 5})):
            print(endpoint, i, t)
endpoint = ''

customer 0 luke30@hull.com
customer 1 crystal87@yahoo.com
customer 2 hillvickie@hotmail.com
customer 3 victor20@hotmail.com
customer 4 lopezdennis@yahoo.com
transactions 0 1d7b2c3d-5b8f-4796-8099-0a90955e29d0
transactions 1 6134cae4-21d5-4f3f-834b-d91f8523a5bf
transactions 2 c4eb8971-20a5-444e-bf15-5f411002b4e3
transactions 3 03c6f4cc-123c-4b23-abb8-c3675c3bd465
transactions 4 d9421892-4a29-4b60-b9fa-203e0c8c5415


In [23]:
from dataclasses import dataclass
from typing import Generator, ClassVar

@dataclass
class Transactions:
    ts: int
    te: int
    endpoint: ClassVar[str] = 'transactions'
    
    @property
    def params(self) -> dict:
        return {'ts': self.ts, 'te': self.te}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params)

        
@dataclass
class Customer:
    cid:      int
    endpoint: ClassVar[str] = 'customer'
        
    @property
    def params(self) -> dict:
        return {'id': self.cid}
    
    def get(self):
        yield from request(self.endpoint, self.params)

@dataclass
class Customers:
    endpoint: ClassVar[str] = 'customers'
        
    @property
    def params(self) -> dict:
        return {}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params)

In [24]:
from utils import ppd

print('customers:')
ppd(list(
    Customers().get()
))

print('transactions:')
ppd(list(
    Transactions(ts=0, te=5).get()
))

customers:
[
  [38;2;102;102;102m8716[39m,
  [38;2;102;102;102m9917[39m,
  [38;2;102;102;102m8960[39m,
  [38;2;102;102;102m7388[39m,
  [38;2;102;102;102m396[39m
]

transactions:
[
  [38;2;186;33;33m"a6b8c77a-feca-476d-ab32-254f3cf84e5c"[39m,
  [38;2;186;33;33m"9beffb33-5bbb-435b-9450-35cec4dfdb8f"[39m,
  [38;2;186;33;33m"df2c1782-3722-4421-b6dd-2bf3798c8525"[39m,
  [38;2;186;33;33m"8c44735c-8ccc-4b1f-a6c6-788eeb863442"[39m,
  [38;2;186;33;33m"e5b07343-8043-41d9-93f8-c7cb0b1d940a"[39m
]



In [53]:
from dataclasses import dataclass, field
from typing import Generator, ClassVar

from api import API

def request(endpoint, kwargs={}):
    for r in API.get(endpoint, kwargs):
        if r is None:
            break
        yield r

@dataclass
class Stream:
    endpoint: ClassVar[str]

    def params(self) -> dict:
        return {}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params())

    
@dataclass
class Transactions(Stream):
    ts:       int = ''
    te:       int = ''
    endpoint: ClassVar[str] = 'transactions'

    def params(self) -> dict:
        return {'ts': self.ts, 'te': self.te}

        
@dataclass
class Customer(Stream):
    cid:      int = ''
    endpoint: ClassVar[str] = 'customer'
        
    def params(self) -> dict:
        return {'cid': self.cid}


@dataclass
class Customers(Stream):
    endpoint: ClassVar[str] = 'customers'

    def get(self) -> Generator:
        for result in request(self.endpoint, self.params()):
            yield from Customer(cid=result).get()


STREAMS = {
    'customers':    Customers,
    'transactions': Transactions
}

In [54]:
def run(config):
    for stream, conf in config.items():
        worker = STREAMS[stream](**conf)
        for result in worker.get():
            print(stream, worker, result)

run({
    'customers': {},
    'transactions': {'ts': 0, 'te': 5},
})

customers Customers() dmoore@owen.com
customers Customers() escobarmichael@spencer.com
customers Customers() lukelowery@yahoo.com
customers Customers() mhatfield@gmail.com
customers Customers() bmelton@vaughn-morgan.com
transactions Transactions(ts=0, te=5) 99ef65ac-d8f3-4e93-ba75-c77204609bc5
transactions Transactions(ts=0, te=5) 299fa825-4f95-4ba5-8a5c-e966ec53f81c
transactions Transactions(ts=0, te=5) a1951c2e-6446-4521-94da-d9ce59481175
transactions Transactions(ts=0, te=5) b28368d3-2f72-4725-af94-7c897c2b6816
transactions Transactions(ts=0, te=5) 271072f4-3869-4d6f-9a5f-fed49e01c1ab


In [None]:
from tenacity import (
    retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential
)
import singer.metrics as metrics

@retry(
    retry=retry_if_exception_type(HTTPError),
    stop=(stop_after_attempt(MAX_ATTEMPTS)),
    wait=wait_random_exponential(max=WAIT_EXPONENTIAL_MAX),
    reraise=True
)
def gen_request(client, endpoint, params):
    with metrics.http_request_timer(endpoint) as timer:
        LOGGER.debug(f'Request for endpoint {endpoint}: {params}')
        resp = client.get(endpoint, params=params)

        timer.tags[metrics.Tag.http_status_code] = resp.status_code

        resp.raise_for_status()
        return json.loads(resp.content)

In [31]:
import requests

response = requests.get('http://canned/index/')

print(response.status_code, response.json())
!pip install tenacity

200 {'hello': 'world'}


In [32]:
import logging
from dataclasses import dataclass, field
from time import time
from typing import Callable
import functools
import json
import requests

from tenacity import (
    retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential
)

LOG = logging.getLogger(__name__)
logging.basicConfig()

DEFAULT_MAX_ATTEMPTS = 10
MAX_WAIT_SECONDS = 60
MAX_ATTEMPTS = 5

DEFAULT_RETRY_EXCEPTIONS = (ConnectionError,)

def default_handler(d): print(d)
def blank_handler(d):   pass


@dataclass
class Timer:
    start_handler: Callable[[dict], None] = default_handler
    end_handler:   Callable[[dict], None] = default_handler
    start_time: float = field(default_factory=time)

    def duration(self, current_time=time()):
        return current_time - self.start_time

    def __enter__(self):
        self.start_handler()
        return self

    def __exit__(self, type, value, traceback):
        self.end_handler(self.as_dict(time()))

    def as_dict(self, stop_time):
        return {
            'start_time': self.start_time,
            'stop_time': stop_time,
            'duration': self.duration(stop_time),
        }

def start_handler(f,a,k):
    print('starting', {'func': (f, a, k)})

def end_handler(f, a, k, t, status):
    print('finished', {
        'func': (f, a, k),
        'status': status,
        'duration': '{:0.4f}'.format(t.as_dict(time())['duration']),
    })

def request_timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        with Timer(functools.partial(start_handler, func.__name__, args, kwargs), blank_handler) as timer:
            status, result = func(*args, **kwargs)
            end_handler(func.__name__, args, kwargs, timer, status)
            return status, result
    return wrapper

@retry(
    retry=retry_if_exception_type(DEFAULT_RETRY_EXCEPTIONS),
    stop=(stop_after_attempt(MAX_ATTEMPTS)),
    wait=wait_random_exponential(max=MAX_WAIT_SECONDS),
    reraise=True
)
@request_timer
def request(endpoint, params):
    LOG.debug(f'Request for endpoint {endpoint}: {params}')
    resp = requests.get(endpoint, params=params)
    resp.raise_for_status()
    return resp.status_code, resp.json()

In [33]:
for _ in range(10):
    print(request('http://canned/index/', {}), '\n')

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0135'}
(200, {'hello': 'world'}) 

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0106'}
(200, {'hello': 'world'}) 

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0103'}
(200, {'hello': 'world'}) 

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0088'}
(200, {'hello': 'world'}) 

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0081'}
(200, {'hello': 'world'}) 

starting {'func': ('reque

## Some Tricks

In [9]:
class Thingo:
    def __init__(self, a):
        self.a = a
    
obj = Thingo(1)

print('\ntype(obj) :',              type(obj))
print('\nobj.__class__ :',          obj.__class__)
print('\nobj.__class__.__name__ :', obj.__class__.__name__)
print('\nobj.__dict__ :',           obj.__dict__)
print('\nobj.__dir__ :',            obj.__dir__())


type(obj) : <class '__main__.Thingo'>

obj.__class__ : <class '__main__.Thingo'>

obj.__class__.__name__ : Thingo

obj.__dict__ : {'a': 1}

obj.__dir__ : ['a', '__module__', '__init__', '__dict__', '__weakref__', '__doc__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__new__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__']


In [10]:
print('type(obj.__class__.__name__) :', type(obj.__class__.__name__))
print('type(obj.__class__) :', type(obj.__class__))

print(obj.__class__('b').__dict__)

type(obj.__class__.__name__) : <class 'str'>
type(obj.__class__) : <class 'type'>
{'a': 'b'}


In [None]:
`__dict__` can be very useful

In [30]:
from argparse import ArgumentParser

def run(integration, region):
    pass

def parse_args() -> dict:
    parser = ArgumentParser(description='Run an integration')
    parser.add_argument('--integration', required=True, help='the integration to run', choices=['a', 'b'])
    parser.add_argument('--region',      required=True, help='the AWS region to run in', choices=['ap-southeast-2', 'us-west-2'])
    return parser.parse_args().__dict__

if False:
    run(**parse_args())

## Dunders

In [17]:
from dataclasses import dataclass, field

@dataclass
class Thing:
    a: int
    k: str
    
    def __eq__(self, o):
        return o.k == self.k

    def __lt__(self, o):
        return self.k < o.k
    

collection = [
    Thing(2, 'sally'),
    Thing(1, 'barry'),
]
print(collection)
print(list(sorted(collection)))

[Thing(a=2, k='sally'), Thing(a=1, k='barry')]
[Thing(a=1, k='barry'), Thing(a=2, k='sally')]


In [25]:
from typing import List

@dataclass
class Things:
    things: List[Thing] = field(default_factory=list)

    def __getitem__(self, k):
        for t in self.things:
            if t.k == k:
                return t

    def __contains__(self, k):
        return self.__getitem__(k) is not None
    
    def __iter__(self):
        yield from self.things

In [27]:
collection = Things([
    Thing(2, 'sally'),
    Thing(1, 'barry'),
])
print(collection)
print('harry:', 'harry' in collection)
print('sally:', collection['sally'])

for thing in collection:
    print('--', thing)

Things(things=[Thing(a=2, k='sally'), Thing(a=1, k='barry')])
harry: False
sally: Thing(a=2, k='sally')
-- Thing(a=2, k='sally')
-- Thing(a=1, k='barry')
