# Lesson 2: Class Warfare

> Disclaimer: Most of these points should only be applied to Python.

- Overutilising or underutilising classes can lead to ruin
- Classes can be a powerful tool or an endless garden path

## Benefits vs Drawbacks

### Benefits

- Can keep track of state
  - No need to pass parameters back and forth
  - No thread-unsafe global variables
  - Can logically initialise state and then use it
- Can organise a hierarcy of states that belong together
- Provide dot-methods for accessing properties
  - "ask, don't tell"

- We have a collection of files
- Must get some attributes from each, and add those into a shared collection

## Functions vs Classes

### Functions vs Methods

In python

- a **function** takes parameters, returns a value
- a **method** can be called on an object, and can access state in the object

## Baby steps

In [2]:
CONFIG = {
    'thing': 'a',
    'identifiier': 'b',
    'name': 'c'
}

_This isn't very safe if something goes wrong_

In [3]:
CONFIG['identifier']

KeyError: 'identifier'

`namedtuple` == Quick 'n' dirty class!

Used when you just need to
- make sure that the correct keys/values are present
- access something a few times (safely) via a dot method rather than a dict key lookup

In [5]:
from collections import namedtuple

Config = namedtuple('config', ['thing', 'identifier', 'name'])

CONFIG = Config('a', 'b', 'c')

print(CONFIG)
print(CONFIG.thing, CONFIG.identifier)

config(thing='a', identifier='b', name='c')
a b


Now let's try the failing example again

In [None]:
Config(**{
    'thing': 'a',
    'identifiier': 'b',
    'name': 'c'
})

Much better! This is useful when loading a JSON config, and you need to make sure all the key are present

In [None]:
import json
raw = '{"identifier": 123, "name": "me", "thing": 123}'

Config(**json.loads(raw))

In [None]:
raw = '{"identifier": 123, "name": "me", "thing": 123, "extra": 1}'

Config(**json.loads(raw))

---

## An Example

- You have a collection of items, in this case ids and emails
- Need to iterate through them, collect some values, and pass them on

In [None]:
from faker import Faker
from utils import ppj
from itertools import islice

fake = Faker()

def fake_record(i, spanner=False):
    if spanner and i % 5 == 0:
        return (i, fake.uuid4(), fake.email(), None)
    else:
        return (i, fake.uuid4(), fake.email(), fake.pyint())

def iterate(n=10, spanner=False):
    '''
    This method will yield a tuple of each item, and a boolean indicating 
    if there are more items.
    After all items are consumed, this method will yield None/False
    (This means that we can't just do "for item in iterate(collection)")
    '''
    for i in range(n-1):
        yield fake_record(i, spanner), True
    yield fake_record(i, spanner), False
    while True:
        yield None, False

In [None]:
collection = iterate()
        
sample = list(islice(collection, 2))
ppj(json.dumps(sample, indent=2))

In [None]:
collection = iterate()
for i, (el, has_next) in enumerate(collection):
    print(i, el, has_next)
    if i >= 12:
        break

---

_For our exercise, we are only going to collect the ID if the last column has the value "true"_

In [None]:
collection = iterate()

ids = []
for el, has_next in collection:
    print(el, has_next)
    if el[3] % 2:
        ids.append(el[1])
    if not has_next:
        break

### Adding more stuff

Let's add some details around how many items we consumed/are up to

In [None]:
def process(collection):
    ids = []
    for i, (el, has_next) in enumerate(collection):
        print(i, el, has_next)
        if el[3] % 2:
            ids.append(el[1])
        if has_next == False:
            break

## Problems

Let's add a spanner

In [None]:
process(iterate(10, True))

OK, so let's just add an `isinstance` check

In [None]:
def process(collection):
    ids = []
    for i, (el, has_next) in enumerate(collection):
        print(i, el, has_next)
        if isinstance(el, int) and el[3] % 2:
            ids.append(el[1])
        if len(el[2]) > 15:
            print(f'{"-".join(el)}')
        if has_next == False:
            break

In [None]:
process(iterate())

Now add something that makes sure we can stringify the item

In [None]:
def process(collection):
    ids = []
    for i, (el, has_next) in enumerate(collection):
        print(i, el, has_next)
        if isinstance(el, int) and el[3] % 2:
            ids.append(el[1])
        if len(el[2]) > 20:
            print(f'!! {"-".join(map(str, el))}')
        if has_next == False:
            break
process(iterate())

It looks confusing now so let's add some comments

In [None]:
def process(collection):
    ids = []
    for i, (el, has_next) in enumerate(collection): # <--- ⚠
        print(i, el, has_next)
        # collect item if it's even
        if isinstance(el, int) and el[3] % 2: # < ----------------⚠
            ids.append(el[1])
        # Warn about large items
        if len(el[2]) > 20: # <-----------------------------------⚠
            print(f'!! {"-".join(map(str, el))}') # <-------------⚠
        if has_next == False:
            break

process(iterate())

---

### Let's take a step back

We use values from the raw item without knowing that they're usable

Instead of holding all the logic in this method, what if we could _ask_ each element if it was even?

In [None]:
from dataclasses import asdict, dataclass

@dataclass
class Element:
    numeric_id: int
    uuid: str
    email: str
    score: int
        
def process(collection):
    ids = []
    for i, (el, has_next) in enumerate(collection):
        el = Element(*el)
        print(i, el, has_next)
        # collect item if it's even
        if isinstance(el.score, int) and el.score % 2:
            ids.append(el.uuid)
        # Warn about large items
        if len(el.email) > 20:
            print(f'!! {"-".join(map(str, asdict(el).values()))}')
        if has_next == False:
            break
process(iterate())

In [None]:
from dataclasses import dataclass

@dataclass
class Element:
    numeric_id: int
    uuid: str
    email: str
    score: int

    def is_even(self) -> bool:
        try:
            return self.score % 2
        except TypeError:
            return False

    def email_len(self, limit=20) -> bool:
        return len(self.email) > limit
    
    def as_row(self, delim='-'):
        return delim.join(map(str, [
            self.numeric_id, self.uuid, self.email, self.score,
        ]))
        
        
def process(collection):
    ids = []
    for i, (el, has_next) in enumerate(collection):
        el = Element(*el)
        print(i, el, has_next)

        if el.is_even:
            ids.append(el.uuid)
        if el.email_len() > 20:
            print(f'!! {el.as_row()}')

        if has_next == False:
            break
process(iterate())

## Now the collection itself

The collection iterator needs some work.
We need something that we can use like this:

```python
# Loop exits when no more items
for el in X:
    Element.from_api(el)
```

In [None]:
def paginate(collection):
    for i, (el, has_next) in enumerate(collection):
        yield i, el
        if not has_next:
            return

print('---- old')
for i, el in enumerate(iterate(5)):
    if i > 8:
        break
    print(el)
    
print('---- new')
for el in paginate(iterate(5)):
    print(el)

In [None]:
def process(collection):
    ids = []
    for i, el in paginate(collection): # <-------- ✓
        el = Element(*el)              # <-------- ⚠
        print(i, el)

        if el.is_even:
            ids.append(el.uuid)
        if el.email_len() > 20:
            print(f'!! {el.as_row()}')

process(iterate(5))

In [None]:
@dataclass
class Element:
    numeric_id: int
    uuid: str
    email: str
    score: int

    def is_even(self) -> bool:
        try:
            return self.score % 2
        except TypeError:
            return False

    def email_len(self) -> bool:
        return len(self.email)
    
    def as_row(self, delim='-'):
        return delim.join(map(str, [
            self.numeric_id, self.uuid, self.email, self.score,
        ]))
    
    @staticmethod
    def from_api(raw):
        return Element(*raw)

def paginate(collection):
    for i, (el, has_next) in enumerate(collection):
        yield i, el
        if not has_next:
            return
    
def process(collection):
    ids = []
    for i, el in paginate(collection):
        el = Element.from_api(el)
        print(i, el)

        if el.is_even:
            ids.append(el.uuid)
        if el.email_len() > 20:
            print(f'!! {el.as_row()}')

process(iterate(5))

What if we want to send all the even and odd records to different places?
Or, collect all the emails from both categories?

In [None]:
def process(collection, debug=False):
    even_ids = []
    odd_ids = []
    for i, el in paginate(collection):
        el = Element.from_api(el)
        print(i, el)

        if el.is_even():
            even_ids.append(el.uuid)
        else:
            odd_ids.append(el.uuid)
        if el.email_len() > 20 and debug:
            print(f'!! {el.as_row()}')
    return even_ids, odd_ids

even, odd = process(iterate(8))
print('\n', 'even:', len(even), 'odd:', len(odd))
print(even)

What if we want to collect the emails of odd/even people instead? or something else in the future?

Step 1: just return the entire objects, don't grab values from them

In [None]:
def process(collection, debug=False):
    even = []
    odd = []
    for i, el in paginate(collection):
        el = Element.from_api(el)
        print(i, el)

        if el.is_even():
            even.append(el)
        else:
            odd.append(el)
        if el.email_len() > 20 and debug:
            print(f'!! {el.as_row()}')
    return even, odd

even, odd = process(iterate(8))
print('\n', 'even:', len(even), 'odd:', len(odd))
print([el.email for el in even])

In [None]:
from collections import Counter
from typing import List
from itertools import filterfalse

def paginate(collection):
    for el, has_next in collection:
        yield el
        if not has_next:
            return

@dataclass
class Collection:
    items: List[Element]

    def from_raw(items):
        return Collection(list(map(Element.from_api, items)))
        
    def emails(self):
        return [el.email for el in self.items]

    def __iter__(self):
        yield from self.items
        
    def odd_records(self):
        return Collection(list(filter(lambda x: x.score % 2, self.items)))
    
    def even_records(self):
        return Collection(list(filterfalse(lambda x: x.score % 2, self.items)))


c = Collection.from_raw(paginate(iterate(8)))
print('all emails\n', c.emails())

print('\nodd records\n', c.odd_records())
print('\neven records\n', c.even_records())

print('\neven emails!\n', c.even_records().emails())

In [None]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('8bZh5LMaSmE?t=350')

---

### Filtering and sorting

If you have a static method (not an instance method), you can filter with that instead of having to use a lambda

In [None]:
@dataclass
class Element:
    numeric_id: int
    uuid: str
    email: str
    score: int

    @staticmethod
    def from_api(raw):
        return Element(*raw)
        
    def is_even(self) -> bool:
        try:
            return self.score % 2
        except TypeError:
            return False
    
    @staticmethod
    def _is_even(element):
        return element._is_even()
    
    @staticmethod
    def _is_false(element):
        return not element._is_even()

    def email_len(self) -> bool:
        return len(self.email)
    
    def as_row(self, delim='-'):
        return delim.join(map(str, [
            self.numeric_id, self.uuid, self.email, self.score,
        ]))

@dataclass
class Collection:
    items: List[Element]

    def from_raw(items):
        return Collection(list(map(Element.from_api, items)))
        
    @property
    def emails(self):
        return [el.email for el in self.items]

    def __iter__(self):
        yield from self.items
        
    def filter_records(self, pred):
        return Collection(list(filter(pred, self.items)))


c = Collection.from_raw(paginate(iterate()))
print('total items', len(c.items))

even = c.filter_records(Element.is_even)
print('\neven items\n', len(even.emails), even.emails)

odd = c.filter_records(Element.is_even)
print('\nodd items\n', len(even.emails), even.emails)

In [None]:
c.filter_records(lambda x: x.email.startswith('a'))

In [None]:
c.filter_records(lambda x: '@' in x.email)

In [None]:
@dataclass
class Element:
    numeric_id: int
    uuid: str
    email: str
    score: int

    @staticmethod
    def from_api(raw):
        return Element(*raw)
        
    def is_even(self) -> bool:
        try:
            return self.score % 2
        except TypeError:
            return False
    
    @staticmethod
    def _is_even(element):
        return element._is_even()
    
    @staticmethod
    def _is_false(element):
        return not element._is_even()

    def email_len(self) -> bool:
        return len(self.email)
    
    def as_row(self, delim='-'):
        return delim.join(map(str, [
            self.numeric_id, self.uuid, self.email, self.score,
        ]))

@dataclass
class Collection:
    items: List[Element]

    def from_raw(items):
        return Collection(list(map(Element.from_api, items)))

    @property
    def emails(self):
        return [el.email for el in self.items]

    def __iter__(self):
        yield from self.items

    def filter_records(self, pred):
        return Collection(list(filter(pred, self.items)))

    
c = Collection.from_raw(paginate(iterate()))
print('total items', len(c.items))

even = c.filter_records(Element.is_even)
print('\neven items\n', len(even.emails), even.emails)

odd = c.filter_records(Element.is_even)
print('\nodd items\n', len(even.emails), even.emails)

## Representation

Dunder methods!

Let's make a few options:

- All objects as JSON
- All objects as rows/lists

In [None]:
@dataclass
class Element:
    numeric_id: int
    uuid: str
    email: str
    score: int

    @staticmethod
    def from_api(raw):
        return Element(*raw)
        
    def is_even(self) -> bool:
        try:
            return self.score % 2
        except TypeError:
            return False
    
    @staticmethod
    def _is_even(element):
        return element._is_even()
    
    @staticmethod
    def _is_false(element):
        return not element._is_even()

    def email_len(self) -> bool:
        return len(self.email)
    
    def as_row(self, delim='-'):
        return delim.join(map(str, [
            self.numeric_id, self.uuid, self.email, self.score,
        ]))
    
    def as_json(self):
        return json.dumps(self.__dict__)

@dataclass
class Collection:
    items: List[Element]

    def from_raw(items):
        return Collection(list(map(Element.from_api, items)))
        
    @property
    def emails(self):
        return [el.email for el in self.items]

    def __iter__(self):
        yield from self.items
        
    def filter_records(self, pred):
        return Collection(list(filter(pred, self.items)))
    
    def as_json(self, **kwargs):
        return json.dumps(
            [asdict(el) for el in self.items],
            **kwargs
        )

    
c = Collection.from_raw(paginate(iterate()))
print('total items', len(c.items))

even = c.filter_records(Element.is_even)
print('\neven items\n', len(even.emails), even.emails)

odd = c.filter_records(Element.is_even)
print('\nodd items\n', len(even.emails), even.emails)

In [None]:
c = Collection.from_raw(paginate(iterate()))

ppj(c.items[0].as_json())
ppj(c.as_json())

In [None]:
ppj(c[:2].as_json(indent=2))

In [None]:
What if we just want to print the first few items?

In [None]:
What if we just want to print the first few items

In [None]:
ppj(c.items[:2].as_json(indent=2))

In [None]:
ppj(Collection(c.items[:2]).as_json(indent=2))

In [None]:
@dataclass
class Collection:
    items: List[Element]

    def from_raw(items):
        return Collection(list(map(Element.from_api, items)))

    def __getitem__(self, i):
        return Collection(self.items[i])
    
    @property
    def emails(self):
        return [el.email for el in self.items]

    def __iter__(self):
        yield from self.items
        
    def filter_records(self, pred):
        return Collection(list(filter(pred, self.items)))
    
    def as_json(self, **kwargs):
        return json.dumps(
            [asdict(el) for el in self.items],
            **kwargs
        )

In [None]:
c = Collection.from_raw(paginate(iterate()))

ppj(c[1:2].as_json(indent=2))
ppj(c[1:3].as_json(indent=2))

### Sorting!

You can sort easily if you already have handy methods available for getting the values to sort by

In [None]:
list(sorted(c, key=lambda x: x.score))

https://github.com/tomquirk/realestate-com-au-api/blob/8368da02a67aaf1c2fe9634f19181fb54685718d/realestate_com_au/realestate_com_au.py#L70-L118