# Lesson 2: Classing Up the Joint

> _Disclaimer: These points were intended to be applied to Python, your mileage may vary._

- Overutilising or underutilising classes can lead to ruin
- Classes can be a powerful tool or an endless garden path

## Pros

- Can break up a complicated problem into smaller parts
- Can keep track of state
  - No need to pass parameters back and forth
  - No thread-unsafe global variables
  - Can logically initialise state and then use it
- Can organise a hierarcy of states that belong together
- Provide dot-methods for accessing properties
  - "ask, don't tell"
  - SOLID design principles

## Cons

- Can make code convoluted and hard to read
  - code spread across multiple files
  - logic for a single operation spread across multiple different parts
  - buries actual logic
  
> _Good OOP in Python should make things **more** explicit, not **less** explicit_

## Functions vs Classes

### Functions vs Methods

In python

- a **function** takes parameters, returns a value
- a **method** can be called on an object, and can access state in the object


### Some examples of saturated methods

- [pandas read_csv](https://github.com/pandas-dev/pandas/blob/d01561fb7b9ad337611fa38a3cfb7e8c2faec608/pandas/io/parsers/readers.py#L708-L747)
- [wikipedia parser](https://github.com/Grasia/wiki-scripts/blob/a154b995fafe440014e28f1936367638a34c7942/wiki_dump_parser/wiki_dump_parser.py#L50-L79)

## Baby steps

Let's examine a common case - we want to use a config dict safely

e.g. what if there was a typo in one of the key names when coding it initially?

In [1]:
CONFIG = {
    'thing': 'a',
    'identifiier': 'b',
    'name': 'c'
}
CONFIG

{'thing': 'a', 'identifiier': 'b', 'name': 'c'}

_This isn't very safe if something goes wrong, and the IDE can't offer us any help_

Worse than that, we'll only be alerted at run-time when the code reaches this point

In [2]:
CONFIG['identifier']

KeyError: 'identifier'

## Namedtuple

`namedtuple` == Quick 'n' dirty class!

Use when you just need to
- make sure that the correct keys/values are present
- access something a few times (safely) via a dot method rather than a dict key lookup

In [3]:
from collections import namedtuple

Config = namedtuple('config', [
    'thing',
    'identifier',
    'name'
])
CONFIG = Config('a', 'b', 'c')

print(CONFIG)
print(CONFIG.thing, CONFIG.identifier)

config(thing='a', identifier='b', name='c')
a b


Now, let's try the failing example again

In [5]:
Config(**{
    'thing': 'a',
    'identifiier': 'b',
    'name': 'c'
})

TypeError: __new__() got an unexpected keyword argument 'identifiier'

### A multi-level config object

In [10]:
from collections import namedtuple

# Define what your config objects need to contain
Endpoints = namedtuple(
    'endpoints', [
        'customers',
        'products'
    ]
)

Endpoint = namedtuple(
    'endpoints', [
        'key',
        'url',
        'timeout',
        'n_workers'
    ]
)
# Initialise all configs with their values
# Could read this from a JSON file, command-line args, or define it here
# Either way, the namedtuple will ensure that the result is the same
ENDPOINTS = Endpoints(**{
    'customers': Endpoint('customers', 'customers/search/', 200, 2),
    'products': Endpoint('products', 'products/all/search/', 100, 2)
    })

print(ENDPOINTS.customers.url)

customers/search/


Much better!

This means that we catch the error when Config is _**initialised**_, rather than when trying to _**access**_ 'identifier' later on.

This is also useful when loading a JSON config, and you need to make sure all the key are present

In [11]:
import json
configs = [
    '{"identifier": 123, "name": "me"}',
    '{"identifier": 123, "name": "me", "thing": 123, "extra": 1}',
]
for raw_config in configs:
    try:
        Config(**json.loads(raw_config))
    except TypeError as e:
        print(f'caught: {e}')

caught: __new__() missing 1 required positional argument: 'thing'
caught: __new__() got an unexpected keyword argument 'extra'


- However, this doesn't give us much information when we're coding using this Config.
  - We know about the _keys_ in our IDE, but not the _values_
- The hierarchy works, but is hard to read

## Dataclasses

- Python 3.7+
- Syntactic sugar for defining an `__init__` method and instance variables
- also provides a nice `__repr__` method, and some other things

[https://docs.python.org/3/library/dataclasses.html](https://docs.python.org/3/library/dataclasses.html)

### A regular class example

In [15]:
class Obj:
    def __init__(self, a: int=1, b=2, c='default'):
        self.a = a
        self.b = b
        self.c = c

Obj(1)

<__main__.Obj at 0x7faf62a76d00>

### An equivalent dataclass example

In [16]:
from dataclasses import dataclass

@dataclass
class Obj:
    a: int = 1
    b: int = 2
    c: str = 'default'

Obj('a')

Obj(a='a', b=2, c='default')

In [17]:
Obj(d=5)

TypeError: __init__() got an unexpected keyword argument 'd'

### Mypy

In [20]:
from collections import namedtuple

Config = namedtuple('Config', ['a', 'b'])

def test_nt():
    Config(1,2).a + 'a'

from dataclasses import dataclass

@dataclass
class Obj:
    a: int = 1
    b: int = 2
    c: str = 'default'

def test_dc():
    Obj(1).a + 's'

In [21]:
!mypy "/code/intermediate/lesson02_classes/my.py"

my.py:14: [1m[31merror:[m Unsupported operand types for + ([m[1m"int"[m and [m[1m"str"[m)[m
[1m[31mFound 1 error in 1 file (checked 1 source file)[m


---

## An Example

- You have a collection of items, in this case ids and emails
- Need to iterate through them, collect some values, and pass them on

In [23]:
from api import API

for i, el in enumerate(API.get('customers')):
    print(i, el)
    if i >= 6:
        break

0 4905
1 7821
2 755
3 8150
4 6394
5 None
6 None


For this exercise, we must consume a list of endpoints via key, and send them to their own file

In [25]:
for endpoint in ['customer', 'transactions']:
    if endpoint == 'customer':
        for i in API.get(endpoint+'s'):
            if i is None:
                break
            print(endpoint, '--', next(API.get(endpoint, {'cid': i})))
    elif endpoint == 'transactions':
        for i in API.get(endpoint, {'ts': 0, 'te': 5}):
            if i is None:
                break
            print(endpoint, '--', i)

customer -- romerocody@lam-gonzalez.com
customer -- robertbarnes@michael.com
customer -- bradleyerica@gmail.com
customer -- molly66@yahoo.com
customer -- karen36@hotmail.com
transactions -- 67334fe4-87d6-4b23-a888-4469c58fa903
transactions -- 24cc16b8-930b-4bf5-abd1-cd4d80aa2b25
transactions -- 89841889-99b1-483d-b2b6-2dc17f950c75
transactions -- 8f13bb7b-c0fb-46ee-bdae-c03e654ed982
transactions -- 8b785c66-f46c-4a46-a024-b02747b72e57


### A Guest Speaker!

In [26]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('8bZh5LMaSmE?t=330')

[https://github.com/emilybache/GildedRose-Refactoring-Kata](https://github.com/emilybache/GildedRose-Refactoring-Kata)

https://github.com/tomquirk/realestate-com-au-api/blob/8368da02a67aaf1c2fe9634f19181fb54685718d/realestate_com_au/realestate_com_au.py#L70-L118

https://softwareengineering.stackexchange.com/questions/351389/dynamic-dispatch-from-a-string-python

### What's the easiest thing to extract first?

Probably the "request" logic

In [29]:
from api import API

def request(endpoint, kwargs={}):
    for r in API.get(endpoint, kwargs):
        if r is None:
            break
        yield r

for endpoint in ['customer', 'transactions']:
    if endpoint == 'customer':
        for i, c in enumerate(request(endpoint+'s')):
            print(endpoint, i, next(request(endpoint, {'cid': c})))
    elif endpoint == 'transactions':
        for i, t in enumerate(request(endpoint, {'ts': 0, 'te': 5})):
            print(endpoint, i, t)
endpoint = ''


customer 0 jessicamarsh@williams.com
customer 1 heather90@yahoo.com
customer 2 gburns@gmail.com
customer 3 rodney72@yahoo.com
customer 4 susan23@hunter-rogers.com
transactions 0 736db90c-ac06-4a4a-976a-3450772dbc42
transactions 1 e726bfcf-9936-4944-a865-8134b9c3a55c
transactions 2 d3ddc234-ca21-44b8-a228-1eee44618434
transactions 3 59ddd7f8-4d50-4830-87d2-d70bf0cb1963
transactions 4 e4088f98-5c81-4485-b340-85fe8e55246c


In [30]:
from dataclasses import dataclass
from typing import Generator, ClassVar

@dataclass
class Transactions:
    ts: int
    te: int
    endpoint: ClassVar[str] = 'transactions'
    
    @property
    def params(self) -> dict:
        return {'ts': self.ts, 'te': self.te}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params)

        
@dataclass
class Customer:
    cid:      int
    endpoint: ClassVar[str] = 'customer'
        
    @property
    def params(self) -> dict:
        return {'id': self.cid}
    
    def get(self):
        yield from request(self.endpoint, self.params)

@dataclass
class Customers:
    endpoint: ClassVar[str] = 'customers'
        
    @property
    def params(self) -> dict:
        return {}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params)

In [32]:
from utils import ppd

print('customers:')
ppd(list(
    Customers().get()
))

print('transactions:')
ppd(list(
    Transactions(ts=0, te=5).get()
))

customers:
[
  [38;2;102;102;102m5061[39m,
  [38;2;102;102;102m2368[39m,
  [38;2;102;102;102m8152[39m,
  [38;2;102;102;102m5286[39m,
  [38;2;102;102;102m9736[39m
]

transactions:
[
  [38;2;186;33;33m"e538f653-a099-493a-9ffa-d7df4f7d7856"[39m,
  [38;2;186;33;33m"4920ba15-4338-4bf2-83b2-4e2d49265a56"[39m,
  [38;2;186;33;33m"604240e3-8df1-413d-9b43-714d9ff3964f"[39m,
  [38;2;186;33;33m"8f9c8413-52ad-42dd-9e12-6ece8358a656"[39m,
  [38;2;186;33;33m"99e33fc2-dbfd-45cc-932d-815d094e3dad"[39m
]



In [34]:
from dataclasses import dataclass, field
from typing import Generator, ClassVar

from api import API

def request(endpoint, kwargs={}):
    for r in API.get(endpoint, kwargs):
        if r is None:
            break
        yield r

@dataclass
class Stream:
    endpoint: ClassVar[str]

    def params(self) -> dict:
        return {}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params())

    
@dataclass
class Transactions(Stream):
    ts:       int = ''
    te:       int = ''
    endpoint: ClassVar[str] = 'transactions'

    def params(self) -> dict:
        return {'ts': self.ts, 'te': self.te}

        
@dataclass
class Customer(Stream):
    cid:      int = ''
    endpoint: ClassVar[str] = 'customer'
        
    def params(self) -> dict:
        return {'cid': self.cid}


@dataclass
class Customers(Stream):
    endpoint: ClassVar[str] = 'customers'

    def get(self) -> Generator:
        for result in request(self.endpoint, self.params()):
            yield from Customer(cid=result).get()


STREAMS = {
    'customers':    Customers,
    'transactions': Transactions
}

In [37]:
def run(config):
    for stream, conf in config.items():
        worker = STREAMS[stream](**conf)
        for result in worker.get():
            print(stream, worker, result)

run({
    'customers': {},
    'transactions': {'ts': 0, 'te': 5},
})
!pip install tenacity

customers Customers() welchmichael@yahoo.com
customers Customers() qcook@hotmail.com
customers Customers() fharvey@yahoo.com
customers Customers() priscillatorres@gmail.com
customers Customers() ubrennan@gmail.com
transactions Transactions(ts=0, te=5) 9851cf23-dc4a-4e32-9a02-440005d9107a
transactions Transactions(ts=0, te=5) 26a20d34-0558-4cac-ab39-43918b8a20f0
transactions Transactions(ts=0, te=5) 62b70806-6a40-4215-b9eb-51454a24b96b
transactions Transactions(ts=0, te=5) a969e366-294a-4a6e-9904-70942d931442
transactions Transactions(ts=0, te=5) 2ad04186-9d26-4e64-9309-83a87c68cca1
Collecting tenacity
  Downloading tenacity-7.0.0-py2.py3-none-any.whl (23 kB)
Installing collected packages: tenacity
Successfully installed tenacity-7.0.0


In [39]:
from tenacity import (
    retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential
)
import singer.metrics as metrics

@retry(
    retry=retry_if_exception_type(HTTPError),
    stop=(stop_after_attempt(MAX_ATTEMPTS)),
    wait=wait_random_exponential(max=WAIT_EXPONENTIAL_MAX),
    reraise=True
)
def gen_request(client, endpoint, params):
    with metrics.http_request_timer(endpoint) as timer:
        LOGGER.debug(f'Request for endpoint {endpoint}: {params}')
        resp = client.get(endpoint, params=params)

        timer.tags[metrics.Tag.http_status_code] = resp.status_code

        resp.raise_for_status()
        return json.loads(resp.content)

ModuleNotFoundError: No module named 'singer'

In [None]:
import requests

response = requests.get('http://canned/index/')

print(response.status_code, response.json())
!pip install tenacity

In [40]:
import logging
from dataclasses import dataclass, field
from time import time
from typing import Callable
import functools
import json
import requests

from tenacity import (
    retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential
)

LOG = logging.getLogger(__name__)
logging.basicConfig()

DEFAULT_MAX_ATTEMPTS = 10
MAX_WAIT_SECONDS = 60
MAX_ATTEMPTS = 5

DEFAULT_RETRY_EXCEPTIONS = (ConnectionError,)

def default_handler(d): print(d)
def blank_handler(d):   pass


@dataclass
class Timer:
    start_handler: Callable[[dict], None] = default_handler
    end_handler:   Callable[[dict], None] = default_handler
    start_time: float = field(default_factory=time)

    def duration(self, current_time=time()):
        return current_time - self.start_time

    def __enter__(self):
        self.start_handler()
        return self

    def __exit__(self, type, value, traceback):
        self.end_handler(self.as_dict(time()))

    def as_dict(self, stop_time):
        return {
            'start_time': self.start_time,
            'stop_time': stop_time,
            'duration': self.duration(stop_time),
        }

def start_handler(f,a,k):
    print('starting', {'func': (f, a, k)})

def end_handler(f, a, k, t, status):
    print('finished', {
        'func': (f, a, k),
        'status': status,
        'duration': '{:0.4f}'.format(t.as_dict(time())['duration']),
    })

def request_timer(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        with Timer(functools.partial(start_handler, func.__name__, args, kwargs), blank_handler) as timer:
            status, result = func(*args, **kwargs)
            end_handler(func.__name__, args, kwargs, timer, status)
            return status, result
    return wrapper

@retry(
    retry=retry_if_exception_type(DEFAULT_RETRY_EXCEPTIONS),
    stop=(stop_after_attempt(MAX_ATTEMPTS)),
    wait=wait_random_exponential(max=MAX_WAIT_SECONDS),
    reraise=True
)
@request_timer
def request(endpoint, params):
    LOG.debug(f'Request for endpoint {endpoint}: {params}')
    resp = requests.get(endpoint, params=params)
    resp.raise_for_status()
    return resp.status_code, resp.json()

In [41]:
for _ in range(10):
    print(request('http://canned/index/', {}), '\n')

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0608'}
(200, {'hello': 'world'}) 

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0126'}
(200, {'hello': 'world'}) 

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0085'}
(200, {'hello': 'world'}) 

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0130'}
(200, {'hello': 'world'}) 

starting {'func': ('request', ('http://canned/index/', {}), {})}
finished {'func': ('request', ('http://canned/index/', {}), {}), 'status': 200, 'duration': '0.0121'}
(200, {'hello': 'world'}) 

starting {'func': ('reque

## Some Tricks

In [43]:
class Thingo:
    def __init__(self, a):
        self.a = a
    
obj = Thingo(1)

print('\ntype(obj) :',              type(obj))
print('\nobj.__class__ :',          obj.__class__)
print('\nobj.__class__.__name__ :', obj.__class__.__name__)
print('\nobj.__dict__ :',           obj.__dict__)
print('\nobj.__dir__ :',            obj.__dir__())


type(obj) : <class '__main__.Thingo'>

obj.__class__ : <class '__main__.Thingo'>

obj.__class__.__name__ : Thingo

obj.__dict__ : {'a': 1}

obj.__dir__ : ['a', '__module__', '__init__', '__dict__', '__weakref__', '__doc__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__new__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__']


In [45]:
print('type(obj.__class__.__name__) :', type(obj.__class__.__name__))
print('type(obj.__class__) :', type(obj.__class__))

print(obj.__class__('b').__dict__)

type(obj.__class__.__name__) : <class 'str'>
type(obj.__class__) : <class 'type'>
{'a': 'b'}


In [46]:
`__dict__` can be very useful

SyntaxError: invalid syntax (<ipython-input-46-85e5f78d0023>, line 1)

In [47]:
from argparse import ArgumentParser

def run(integration, region):
    pass

def parse_args() -> dict:
    parser = ArgumentParser(description='Run an integration')
    parser.add_argument('--integration', required=True, help='the integration to run', choices=['a', 'b'])
    parser.add_argument('--region',      required=True, help='the AWS region to run in', choices=['ap-southeast-2', 'us-west-2'])
    return parser.parse_args().__dict__

if False:
    run(**parse_args())

## Dunders

In [56]:
from dataclasses import dataclass, field

@dataclass
class Thing:
    a: int
    k: str

    def __eq__(self, o):
        return o.k == self.k

    def __lt__(self, o):
        return self.k < o.k
    

collection = [
    Thing(2, 'sally'),
    Thing(300, 'barry'),
]
print(collection)
print(list(sorted(collection, key=lambda x: x.k)))

[Thing(a=2, k='sally'), Thing(a=300, k='barry')]
[Thing(a=2, k='sally'), Thing(a=300, k='barry')]


In [51]:
from typing import List

@dataclass
class Things:
    things: List[Thing] = field(default_factory=list)

    def __getitem__(self, k):
        for t in self.things:
            if t.k == k:
                return t

    def __contains__(self, k):
        return self.__getitem__(k) is not None
    
    def __iter__(self):
        yield from self.things

In [53]:
collection = Things([
    Thing(2, 'sally'),
    Thing(1, 'barry'),
])
print(collection)
print('harry:', 'harry' in collection)
print('sally:', collection['sally'])

for thing in collection:
    print('--', thing)

Things(things=[Thing(a=2, k='sally'), Thing(a=1, k='barry')])
harry: False
sally: Thing(a=2, k='sally')
-- Thing(a=2, k='sally')
-- Thing(a=1, k='barry')


Don't do things like this!

In [54]:
# Assigning instance vars is too tedious!
# Let's magically populate all the dot methods using the __dict__
class Thingo:
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)


t = Thingo(**{'a': 'b', 'c': 'd'})
print(t.a, t.c)

t = Thingo(**{'k': 'v', 'ok': 'ko'})
print(t.k, t.ok)

b d
v ko
