In [None]:
import course;course.header()

In [None]:
course.display_topics(day=4)

In the first three days we have coded functions that encapsulate our code and that wrangle our data when  passed into our function. 

This is termed [procedural programming](https://en.wikipedia.org/wiki/Procedural_programming)  

Python (and Java, C++, ...) is a object oriented programming language which adds a more natural level to programms. 

For example, instead of storing x, y, z coordinates in lists and then using these lists to calculate distances between two points in space, which could look like:
``` python
def calc_difference(x_coordinates, y_coordiantes, z_coordinates, index1=0, index2=1):
    ...
    return distance
```

it would be much more convenient to be able to 
``` python
a = Point(x1, y1, z1)
b = Point(x1, y1, z1)
difference = a - b
```

Another view of object oriented programming is that we attach functions to a customized data container and define how this data container behaves and how datacontainers of the same type interact

🚀
We call **classes** the blueprints of such customized data containers and **objects** initialized instances of a class. One can spawn many objects from one class, each of which will be unique. 

From the procedural prgramming, we used the terminology functions and variabels. In order to avoid confusion, functions that are associated to classes/objects are called **methods** and variables are calles **attributes** or **properties**.

In [None]:
# one uses the class declaration for class names
# quick reminder - PEP8! https://www.python.org/dev/peps/pep-0008/#class-names
class Sequence(object):
    def aa_distribution(self):
        return "Not implemented yet"
        # should raise NotImplementedError('!')

s1 = Sequence()
s1.aa_distribution()


The *object* in the brackets refer to the parent class from which our Sequence class inherits its properties. Here object is not neccessary since all classes inhert from object to start with.

We defined a new method **aa_distribution** which takes one argument **self**, this is always the case for functions associated to objects - which are called **methods**. Think about it as passing the actual data container into our "function". 

## \__init__
One very importante method is **__init__** as it is called when a new instance is initialized.

Note: methods starting and ending on two _ have special meanings in Python and should not be used in order to avoid collisions. They ar called **magic functions**

In [None]:
from collections import Counter

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        # we store the sequence that is used to 
        # initialize this object into self.sequence
    
    def aa_distribution(self):
        return Counter(self.sequence) 
  
s1 = Sequence("WHEREISELVIS")
s1.aa_distribution()

In [None]:
s1.sequence

Classes and their objects can each have methods and attributes. 

Class attributes can be used like:

In [None]:
from collections import Counter

class Sequence(object):
    
    total_initialized_sequence = 0
    
    def __init__(self, sequence):
        self.sequence = sequence
        Sequence.total_initialized_sequence += 1
        # every time a Sequence object is initialized, we increase
        # the counter of the class attribute
    
    def aa_distribution(self):
        return Counter(self.sequence)
        # should raise 

for _ in range(13):
    s1 = Sequence("AACCEE")

Sequence.total_initialized_sequence
# ^-- note: we are refereing to the actual class Sequence and not the instance s1

Yet each instance has a reference to the one class attribute!

In [None]:
new = Sequence("ELVIS")
new.total_initialized_sequence

Methods that belong to the class are assiged by using a decorator

In [None]:
from collections import Counter

class Sequence(object):
    
    total_initialized_sequence = 0
    
    def __init__(self, sequence):
        self.sequence = sequence
        Sequence.total_initialized_sequence += 1
        # every time a Sequence object is initialized, we increase
        # the counter of the class attribute
    
    def aa_distribution(self):
        return Counter(self.sequence)
        # should raise 

    @classmethod
    def class_status(cls):
        print(f"We have initialized {Sequence.total_initialized_sequence} sequences")
        
for _ in range(3):
    s1 = Sequence("AACCEE") 

Sequence.class_status()

For the sake of readability - class methods argument is **cls** not **self**.

In [None]:
# note: again, each class has this method as well, which is why defining classmethods
#       is in IMHO not so useful ..
s1.class_status()

# more fun with class attributes

In [None]:
from collections import Counter

class Sequence(object):
    
    all_intialized_sequences = []
    
    def __init__(self, sequence):
        self.sequence = sequence
        # we collect all intialized sequences in the class
        Sequence.all_intialized_sequences.append(self)
            
s1 = Sequence("AACCEE")
s1 = Sequence("ELVIS")
s1 = Sequence("NANANA")

Sequence.all_intialized_sequences

In [None]:
Sequence.all_intialized_sequences[1].sequence

## more magic functions

## \__str__
making the object more descriptive

In [None]:
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence

    def __str__(self):
        return "Sequence class mobi-HD, length {0}, id {1}".format(
            len(self.sequence),
            id(self)
        )

In [None]:
s1 = Sequence("ELVISLIVES")
print(s1)

# \__add__ 
allowing adding of objects

In [None]:
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        
    def __str__(self):
        return """Sequence class mobi-HD, length {0}, id {1}, {2}""".format(
            len(self.sequence),
            id(self),
            self.sequence
        )

    def add(self, other):
        return self + other  
    
    def __add__(self, other):
        new_sequence_obj = Sequence(self.sequence + other.sequence)
        return new_sequence_obj
  

In [None]:
s1 = Sequence("ELVIS")
s2 = Sequence("LIVES")
s3 = s1 + s2
print(s1)
print(s2)
print(s3)
s1 += s2
print(s1.add(s2))


# Comparisons
Often we want to sort objects stored in a list or check for equality.
But what does it mean that sequence_1 \< sequence_2 or sequence_1 == sequence_2 ?

In [None]:
# answer is that we need to define magic functions that are called by Python internals 
# in order to eval equality or to sort. Minimum is __eq__ and __lt__, respectively.

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        
    def __str__(self):
        return self.sequence
    
    def __eq__(self, other):
        return self.sequence == other.sequence
    
    def __lt__(self, other):
        # return True if self < other
        # I chose sequence length but it could equally be anything 
        # one can computer for both sequence ...
        self_smaller = True
        if len(self.sequence) >= len(other.sequence):
            self_smaller = False
        return self_smaller
#         return len(self.sequence) < len(other.sequence)
        

In [None]:
s1 = Sequence("ELVISLIVES")
s2 = Sequence("REALLYELVISLIVES")
s3 = Sequence("DEADELVISIS")
 
print("is s1 == s2 ?", s1 == s2)
print("is s1 != s3 ?", s1 != s3)


for sequence in sorted([s2, s3, s1], reverse=False):
    print(sequence)

Minimum set of magic functions that enables equality and sorting
\__eq__ for equality (and inequality) 
\__lt__ for sorting (less than)

For more possibilities see [python docu](https://docs.python.org/3/reference/datamodel.html#object.__lt__) (no need to know all of those for the exam)

# Make our class iterable

In [None]:

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self._current_iter_state = 0
        
    def __iter__(self):
        return self
    
    def __next__(self):
        
        if self._current_iter_state < len(self.sequence):
            current_aa = self.sequence[self._current_iter_state]
            self._current_iter_state += 1
            return current_aa
        raise StopIteration


In [None]:
s3 = Sequence("ELVISISDEAD")
for aa in s3:
    print(aa)
# next(s3)

Why is the second iteration not working?

In [None]:
for aa in s3:
    print(aa)

### Fix it!

In [None]:

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self._current_iter_state = 0
        
    def __iter__(self):
        self._current_iter_state = 0
        return self
    
    def __next__(self):
        
        if self._current_iter_state < len(self.sequence):
            current_aa = self.sequence[self._current_iter_state]
            self._current_iter_state += 1
            return current_aa
        raise StopIteration


### Again, what does it mean to iterate over our object ? Well it is on us to decide, why not iter over it using a sliding window ...

In [None]:
from collections import deque
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence   
        
    def __iter__(self):
        self._current_iter_state = 0
        self._sliding_window = deque([], maxlen=3)
        return self
    
    def __next__(self):
        if self._current_iter_state < len(self.sequence):
            current_aa = self.sequence[self._current_iter_state]
            self._sliding_window.append(current_aa)
            self._current_iter_state += 1
            return self._sliding_window
        raise StopIteration

In [None]:
s3 = Sequence("ELVISISDEAD")
for sliding_window in s3:
    print(sliding_window)

# on demand / lazy loading et al. 

In [None]:
from collections import Counter

class Sequence:
    def __init__(self, sequence):
        self.sequence = sequence
        
    def aa_distribution(self):
        return Counter(self.sequence)
        # Problem is that we calculate aa_distribution every time
        # this method is called 


In [None]:
aa_seq = "ACGHCNASOINDQIEODHASDJALSKDJASDJ" * 100000

In [None]:
s1 = Sequence(aa_seq)

In [None]:
%timeit -n 3 s1.aa_distribution()

In [None]:
%timeit -n 3 s1.aa_distribution()

How to make this faster the second time?

### Fix it!

... let's calculate distribution on demand and only if we have not done before ...

In [None]:
from collections import Counter

class Sequence:
    def __init__(self, sequence):
        self.sequence = sequence
        self._aa_distribution = None
    
    def aa_distribution(self):
        if self._aa_distribution is None:
            self._aa_distribution = Counter(self.sequence)
        return self._aa_distribution


In [None]:
s2 = Sequence(aa_seq)
distribution = s2.aa_distribution()

In [None]:
%timeit -n 3  s2.aa_distribution()

What about properties?
Sequence.aa_distribution feels much more like a property of the sequence,
and a function should start with a verb :)

## accessing properties that do calculations on demand 

In [None]:

class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self.aa_distribution = Counter(self.sequence)

In [None]:
%%timeit -n 3 
s4 = Sequence(aa_seq)

Slow again and we just initialized the class! :(

Note: %timeit profiles a line and %%timeit profiles the whole cell!

## on demand or lazy loading of properties?

In [None]:
# on demand calculation, and if so then only once :)
class Sequence(object):
    def __init__(self, sequence):
        self.sequence = sequence
        self._aa_distribution = None
        
    @property
    def aa_distribution(self):
        if self._aa_distribution is None:
            self._aa_distribution = Counter(self.sequence)
        return self._aa_distribution

In [None]:
%timeit -n 3 Sequence(aa_seq)

In [None]:
s3 = Sequence(aa_seq)
distribution = s3.aa_distribution

In [None]:
%timeit  -n 3 s3.aa_distribution

Class initializes faster! 

# Object inheritance
Another major advantage of OOP is that blue-print properties can be inherited, thus reducing code duplication.

Using inheritance can, however, also lead to complex data / class structure. Follow the Zen of Python! Not every method needs to have its own subclass

Note: The parents constructor (\__init__) is not called by default!

In [None]:
class SequenceBaseClass:
    def __init__(self, sequence):
        self.sequence = sequence

    def __add__(self, other):
        new_sequence_obj = SequenceBaseClass(self.sequence + other.sequence)
        return new_sequence_obj        

    def __len__(self):
        raise NotImplementedError(
            "If you inherit from SequenceBaseClass, "
            "you must define len yourself"
            # Note: you can split strings to make it more readable
        )
    
class Sequence(SequenceBaseClass):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    
    def __len__(self):
        return len(self.sequence)
    
    
s1 = Sequence("SIRIFINDELVIS")

In [None]:
print(s1.sequence)
print(len(s1))

# Finally,
We can check if an object is based on a given class by using:

In [None]:
isinstance(s1, Sequence)