# Lesson 2: Classes in the Wild

> _Disclaimer: These points were intended to be applied to Python, your mileage may vary._

- Overutilising or underutilising classes can lead to ruin
- Classes can be a powerful tool or an endless garden path

## Pros

- Can break up a complicated problem into smaller parts
- Can keep track of state
  - No need to pass parameters back and forth
  - No thread-unsafe global variables
  - Can logically initialise state and then use it
- Can organise a hierarcy of states that belong together
- Provide dot-methods for accessing properties
  - "ask, don't tell"
  
## Cons

- Can make code convoluted and hard to read
  - code spread across multiple files
  - logic for a single operation spread across multiple different parts

## Functions vs Classes

### Functions vs Methods

In python

- a **function** takes parameters, returns a value
- a **method** can be called on an object, and can access state in the object


### Some examples of saturated methods

https://github.com/pandas-dev/pandas/blob/d01561fb7b9ad337611fa38a3cfb7e8c2faec608/pandas/io/parsers/readers.py#L708-L747
https://github.com/Grasia/wiki-scripts/blob/a154b995fafe440014e28f1936367638a34c7942/wiki_dump_parser/wiki_dump_parser.py#L50-L79

## Baby steps

In [1]:
CONFIG = {
    'thing': 'a',
    'identifiier': 'b',
    'name': 'c'
}
CONFIG

{'thing': 'a', 'identifiier': 'b', 'name': 'c'}

_This isn't very safe if something goes wrong, and the IDE can't offer us any help_

In [4]:
CONFIG['identifier']

KeyError: 'identifier'

## Namedtuple

`namedtuple` == Quick 'n' dirty class!

Use when you just need to
- make sure that the correct keys/values are present
- access something a few times (safely) via a dot method rather than a dict key lookup

In [2]:
from collections import namedtuple

Config = namedtuple('config', ['thing', 'identifier', 'name'])

CONFIG = Config('a', 'b', 'c')

print(CONFIG)
print(CONFIG.thing, CONFIG.identifier)

config(thing='a', identifier='b', name='c')
a b


### A multi-level config object

In [4]:
from collections import namedtuple

# Define what your config objects need to contain
Endpoint = namedtuple('endpoints', ['key', 'url', 'timeout', 'n_workers'])
Endpoints = namedtuple('endpoints', ['customers', 'products'])

# Initialise all configs with their values
# Could read this from a JSON file, command-line args, or define it here
# Either way, the namedtuple will ensure that the result is the same
ENDPOINTS = {
    'customers': Endpoint('customers', 'customers/search/', 200, 2),
    'products': Endpoint('products', 'products/all/search/', 100, 2)
}

for endpoint, config in ENDPOINTS.items():
    print(endpoint, config.url)

customers customers/search/
products products/all/search/


Now let's try the failing example again

In [5]:
Config(**{
    'thing': 'a',
    'identifiier': 'b',
    'name': 'c'
})

TypeError: <lambda>() got an unexpected keyword argument 'identifiier'

Much better!

This means that we catch the error when Config is _**initialised**_, rather than when trying to _**access**_ 'identifier' later on.

This is also useful when loading a JSON config, and you need to make sure all the key are present

In [6]:
import json
raw = '{"identifier": 123, "name": "me", "thing": 123}'

Config(**json.loads(raw))

config(thing=123, identifier=123, name='me')

In [7]:
raw = '{"identifier": 123, "name": "me", "thing": 123, "extra": 1}'

Config(**json.loads(raw))

TypeError: <lambda>() got an unexpected keyword argument 'extra'

## Dataclasses

- Python 3.7+
- Syntactic sugar for defining an `__init__` method and instance variables
- also provides a nice `__repr__` method, and some other things

A regular class:

In [8]:
class Obj:
    def __init__(self, a=1, b=2, c='default'):
        self.a = a
        self.b = b
        self.c = c

Obj(1)

<__main__.Obj at 0x1030faee0>

The same, but as a dataclass

In [9]:
from dataclasses import dataclass

@dataclass
class Obj:
    a: int = 1
    b: int = 2
    c: str = 'default'

Obj(1)

Obj(a=1, b=2, c='default')

In [10]:
Obj(d=5)

TypeError: __init__() got an unexpected keyword argument 'd'

---

## An Example

- You have a collection of items, in this case ids and emails
- Need to iterate through them, collect some values, and pass them on

In [8]:
from api import API

In [18]:
for i, el in enumerate(API.get('customers')):
    print(i, el)
    if i >= 6:
        break

0 4844
1 1936
2 5354
3 1036
4 6584
5 None
6 None


For this exercise, we must consume a list of endpoints via key, and send them to their own file

In [32]:
for endpoint in ['customer', 'transactions']:
    if endpoint == 'customer':
        for i in API.get(endpoint+'s'):
            if i is None:
                break
            print(endpoint, '--', next(API.get(endpoint, {'cid': i})))
    elif endpoint == 'transactions':
        for i in API.get(endpoint, {'ts': 0, 'te': 5}):
            if i is None:
                break
            print(endpoint, '--', i)

customer -- arobinson@ward.com
customer -- fernandezbrittany@haney.net
customer -- esanchez@gmail.com
customer -- vwatson@gmail.com
customer -- yhall@ferguson-bell.com
transactions -- 32c3751e-8088-4e33-9ab4-b0d4f0afcf9b
transactions -- 09a1ad38-b86e-4790-84de-50547deabe35
transactions -- a30d01f6-220a-4963-b16d-9795c65ab654
transactions -- 9be8a546-7f3f-45ad-97f3-6222ef55c522
transactions -- e1e6052b-e481-4a88-a786-59cd439b17b1


### A Guest Speaker!

In [4]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('8bZh5LMaSmE?t=330')

[https://github.com/emilybache/GildedRose-Refactoring-Kata](https://github.com/emilybache/GildedRose-Refactoring-Kata)

https://github.com/tomquirk/realestate-com-au-api/blob/8368da02a67aaf1c2fe9634f19181fb54685718d/realestate_com_au/realestate_com_au.py#L70-L118

https://softwareengineering.stackexchange.com/questions/351389/dynamic-dispatch-from-a-string-python

### What's the easiest thing to extract first?

In [9]:
from api import API

def request(endpoint, kwargs={}):
    for r in API.get(endpoint, kwargs):
        if r is None:
            break
        yield r


for endpoint in ['customer', 'transactions']:
    if endpoint == 'customer':
        for i, c in enumerate(request(endpoint+'s')):
            print(endpoint, i, next(request(endpoint, {'cid': c})))
    elif endpoint == 'transactions':
        for i, t in enumerate(request(endpoint, {'ts': 0, 'te': 5})):
            print(endpoint, i, t)
endpoint = ''

ModuleNotFoundError: No module named 'faker'

In [7]:
from dataclasses import dataclass
from typing import Generator, ClassVar

@dataclass
class Transactions:
    ts: int
    te: int
    endpoint: ClassVar[str] = 'transactions'
    
    @property
    def params(self) -> dict:
        return {'ts': self.ts, 'te': self.te}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params)

        
@dataclass
class Customer:
    cid:      int
    endpoint: ClassVar[str] = 'customer'
        
    @property
    def params(self) -> dict:
        return {'id': self.cid}
    
    def get(self):
        yield from request(self.endpoint, self.params)

@dataclass
class Customers:
    endpoint: ClassVar[str] = 'customers'
        
    @property
    def params(self) -> dict:
        return {}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params)

In [8]:
from utils import ppd

print('customers:')
ppd(list(
    Customers().get()
))

print('transactions:')
ppd(list(
    Transactions(ts=0, te=5).get()
))

customers:


NameError: name 'request' is not defined

In [53]:
from dataclasses import dataclass, field
from typing import Generator, ClassVar

from api import API

def request(endpoint, kwargs={}):
    for r in API.get(endpoint, kwargs):
        if r is None:
            break
        yield r

@dataclass
class Stream:
    endpoint: ClassVar[str]

    def params(self) -> dict:
        return {}

    def get(self) -> Generator:
        yield from request(self.endpoint, self.params())

    
@dataclass
class Transactions(Stream):
    ts:       int = ''
    te:       int = ''
    endpoint: ClassVar[str] = 'transactions'

    def params(self) -> dict:
        return {'ts': self.ts, 'te': self.te}

        
@dataclass
class Customer(Stream):
    cid:      int = ''
    endpoint: ClassVar[str] = 'customer'
        
    def params(self) -> dict:
        return {'cid': self.cid}


@dataclass
class Customers(Stream):
    endpoint: ClassVar[str] = 'customers'

    def get(self) -> Generator:
        for result in request(self.endpoint, self.params()):
            yield from Customer(cid=result).get()


STREAMS = {
    'customers':    Customers,
    'transactions': Transactions
}

In [54]:
def run(config):
    for stream, conf in config.items():
        worker = STREAMS[stream](**conf)
        for result in worker.get():
            print(stream, worker, result)

run({
    'customers': {},
    'transactions': {'ts': 0, 'te': 5},
})

customers Customers() dmoore@owen.com
customers Customers() escobarmichael@spencer.com
customers Customers() lukelowery@yahoo.com
customers Customers() mhatfield@gmail.com
customers Customers() bmelton@vaughn-morgan.com
transactions Transactions(ts=0, te=5) 99ef65ac-d8f3-4e93-ba75-c77204609bc5
transactions Transactions(ts=0, te=5) 299fa825-4f95-4ba5-8a5c-e966ec53f81c
transactions Transactions(ts=0, te=5) a1951c2e-6446-4521-94da-d9ce59481175
transactions Transactions(ts=0, te=5) b28368d3-2f72-4725-af94-7c897c2b6816
transactions Transactions(ts=0, te=5) 271072f4-3869-4d6f-9a5f-fed49e01c1ab


In [None]:
from tenacity import (
    retry, retry_if_exception_type, stop_after_attempt, wait_random_exponential
)
import singer.metrics as metrics

@retry(
    retry=retry_if_exception_type(HTTPError),
    stop=(stop_after_attempt(MAX_ATTEMPTS)),
    wait=wait_random_exponential(max=WAIT_EXPONENTIAL_MAX),
    reraise=True
)
def gen_request(client, endpoint, params):
    with metrics.http_request_timer(endpoint) as timer:
        LOGGER.debug(f'Request for endpoint {endpoint}: {params}')
        resp = client.get(endpoint, params=params)

        timer.tags[metrics.Tag.http_status_code] = resp.status_code

        resp.raise_for_status()
        return json.loads(resp.content)

In [None]:
import requests

response = requests.get('http://canned/index/')

print(response.status_code, response.json())

## Some Tricks

In [56]:
obj = 's'

print('type(obj):', type(obj))

type(obj): <class 'str'>


In [62]:
class Thingo:
    def __init__(self, a):
        self.a = a
    
obj = Thingo(1)

print('\ntype(obj) :',              type(obj))
print('\nobj.__class__ :',          obj.__class__)
print('\nobj.__class__.__name__ :', obj.__class__.__name__)
print('\nobj.__dict__ :',           obj.__dict__)
print('\nobj.__dir__ :',            obj.__dir__())


type(obj) : <class '__main__.Thingo'>

obj.__class__ : <class '__main__.Thingo'>

obj.__class__.__name__ : Thingo

obj.__dict__ : {'a': 1}

obj.__dir__ : ['a', '__module__', '__init__', '__dict__', '__weakref__', '__doc__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__new__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__']


In [67]:
print('type(obj.__class__.__name__) :', type(obj.__class__.__name__))
print('type(obj.__class__) :', type(obj.__class__))

print(obj.__class__('b').__dict__)

type(obj.__class__.__name__) : <class 'str'>
type(obj.__class__) : <class 'type'>
{'a': 'b'}
