### MY470 Computer Programming
# Classes in Python
### Week 5 Lecture

## Overview

* Object-oriented programming
* Classes
* Class inheritance and polymorphism
* Encapsulation and information hiding
* Generators
* Team formation for Assignment 5
---
* Useful Python package: `networkx`

## From Last Week: Decomposition and Abstraction

* Decomposition creates structure
* Abstraction hides detail

![Decomposition and abstraction](figs/decomposition_abstraction.png "Decomposition and abstraction")

## Achieving Decomposition and Abstraction

* With functions
* With **classes**

## From Week 2: Objects

### Python supports many different kinds of objects

* `25`, `'LSE'`, `[1, 2, 7, 0]`, `range(10)` 

### In fact, EVERYTHING in Python is an object

* Objects have types (belong to classes)
* Objects also have a set of procedures for interacting with them (methods)

In [1]:
s = 'some string'
print(type(s))
print(s.upper())

<class 'str'>
SOME STRING


## Object-Oriented Programming

A programming paradigm based on the concept of "objects"

An object is a **data abstraction** that captures:

* **Internal representation** (data attributes)
* **Interface** for interacting with object (methods)


## Procedural  vs. Object-Oriented Programming

![Procedural vs. object-oriented programming](figs/procedural_object-oriented.png "Procedural vs. object-oriented programming")

## Abstraction

![Abstraction in science](figs/science_abstraction.png "Abstraction in science")

## Data Abstraction With Classes


In [5]:
from datetime import date

class Person(object): # python convention: Capitalise Class names
    #        ^ : "We extend the superclass 'object' "
        
    def __init__(self, f_name, l_name):             # instantiate a new instance of the object 
        'Creates a person using first and last names.'
        self.first_name = f_name
        self.last_name = l_name
        self.birthdate = None # Not required from the begining
    
    def get_name(self):
        '''Gets self\'s full name.'''
        return self.first_name + ' ' + self.last_name
    
    def get_age(self):
        '''Gets self\'s age in years.'''
        return date.today().year - self.birthdate.year
    
    def set_birthdate(self, dob):
        '''Assumes dob is of type date.
        Sets self\'s birthdate to dob.'''
        self.birthdate = dob
    
    #def __str__(self):
    #    '''Returns self's full name.'''
    #    return self.first_name + ' ' + self.last_name
    
    def __str__(self):
        '''Returns self's full name.'''
        return self.get_name() # also possible
    
p1 = Person('Malala', 'Yousafzai')
p1.set_birthdate(date(1997, 7, 12))
print(p1, p1.get_age())

Malala Yousafzai 21


In [10]:
p2 = Person('Albert', 'Einstein')
p2.get_age()

AttributeError: 'NoneType' object has no attribute 'year'

## Classes in Python

* Data attributes — `first_name`, `last_name`, `birthdate`
* Methods
  * `get_name()`, `get_age()`, `set_birthdate()`
  * `__init__()` — called when a class is instantiated
  * `__str__()` — called by `print()` and `str()`
  
---

* Operations
  * Instantiation: `p1 = Person('Malala', 'Yousafzai')` calls method `__init__()`
  * Attribute/method reference: `p1.get_age()`

## Classes vs. Objects

* `Person` is a class
* `p1` is an instance of the class `Person`; it is an object of type `Person`
* Similarly, `str` is a class and `'Malala Yousafzai'` is an object of type `str`

![Class vs. object](figs/person_malala.png "Class vs. object")

Images from: Adrien Coquet and Simon Davis/DFID



## `self`

```
def set_birthdate(self, dob):
    self.birthdate = dob
```

* Variable that references the current instance of the class
* The name is a convention
* It's a *strong* convention — **Do not use any other variable name!**

## Class Operations: Method Reference

```
def get_age(self):
    return date.today().year - self.birthdate.year
```

* Methods are functions that are associated with a class
* Two ways to call methods:
  * `p1.get_age()`  **— Use this one!**
  * `Person.get_age(p1)`

## Special Methods

* `__init__()` — called when a class is instantiated
* `__str__()` — called by `print()` and `str()`

#### Can be redifined for the purpose of your specific class

* `__lt__()` — overloads the `<` operator
* `__le__()` — overloads the `<=` operator
* `__eq__()` — overloads the `==` operator
* `__ne__()` — overloads the `!=` operator (defaults to opposite of `__eq__()`)
* `__gt__()` — overloads the `>` operator
* `__ge__()` — overloads the `>=` operator

### Overloading provides access to other methods defined using the methods above

* E.g. `sort()`

In [2]:
class Person(object):
        
    def __init__(self, f_name, l_name):
        'Creates a person using first and last names.'
        self.first_name = f_name
        self.last_name = l_name
        self.birthdate = None
    
    def get_name(self):
        '''Gets self\'s full name.'''
        return self.first_name + ' ' + self.last_name
    
    def get_age(self):
        '''Gets self\'s age in years.'''
        return date.today().year - self.birthdate.year
    
    def set_birthdate(self, dob):
        '''Assumes dob is of type date.
        Sets self\'s birthdate to dob.'''
        self.birthdate = dob
    
    def __str__(self):
        '''Returns self's full name.'''
        return self.first_name + ' ' + self.last_name
    
    def __lt__(self, other):
        '''Returns True if self\'s last name precedes other\'s last name
        in alphabethical order. If they are equal, compares first names.'''
        if self.last_name==other.last_name:
            return self.first_name < other.first_name
        return self.last_name < other.last_name
    
p1 = Person('Malala', 'Yousafzai')
p2 = Person('Robert', 'Webb')
print(p1 < p2)

lst = sorted([p1, p2])
print([str(i) for i in lst])

False
['Robert Webb', 'Malala Yousafzai']


## Class Operations: Attribute Reference

```
def __init__(self, f_name, l_name):
    self.first_name = f_name
    self.last_name = l_name
    self.birthdate = None
```

### In Python, you have direct access to instance attributes but you shouldn't use it

  * `p1.first_name`  **<— DO NOT EVER USE THIS ONE!** 
  * `p1.get_name()`  **<— Use this one instead**

In [3]:
p1.get_name().split()[0]  # Or write a new method get_first_name()

'Malala'

Using methods to get instance attributes is essential for encapsulation and information hiding — two important goals of object-oriented programming.

## Inheritance

> **Q:** *What's the object-oriented way to become wealthy?*

> **A:** *Inheritance.*

## Inheritance

* Allows to build hierarchies of related abstractions
* **Subclasses** inherit data attributes and methods from their **superclasses** (classes that are higher in the hierarchy)
* On top of the hierarchy is class `object`
* Subclasses can:
  * Add new data attributes and methods
  * Override data attributes and methods of the superclass

## Subclasses 

In [4]:
class LSEPerson(Person):
    # LSEPerson is a subclass of Person which itself is a subclass of object
    
    # This is a class variable
    next_id_num = 1 # unique identification number
        
    def __init__(self, f_name, l_name):
        'Creates an LSE person using first and last names.'
        Person.__init__(self, f_name, l_name)
        self.id_num = LSEPerson.next_id_num
        LSEPerson.next_id_num += 1
    
    def get_id_num(self):
        '''Gets self\'s unique LSE number.'''
        return self.id_num
    
    def __lt__(self, other):
        '''Returns True if self\'s id number is smaller than other\'s id number.'''
        return self.id_num < other.id_num

staff1 = LSEPerson('Milena', 'Tsvetkova')
print(staff1, staff1.get_id_num())

staff2 = LSEPerson('Pablo', 'Barbera')
print(staff2, staff2.get_id_num())

print(staff1 < staff2)
print(p1 < staff1)
# print(staff1 < p1) # doesn't work because looks at staff1 first and uses the LSEPerson definition of <

Milena Tsvetkova 1
Pablo Barbera 2
True
False


In [5]:
LSEPerson.mro()

[__main__.LSEPerson, __main__.Person, object]

## Polymorphism

* An expression can do different things depending on the objects it applies to
* Enabled by overriding inherited methods
* Helps reduce code

## Inheritance Hierarchies 

In [7]:
class Staff(LSEPerson):
    pass

class Admin(Staff):
    pass

class Acad(Staff):
    pass

class Student(LSEPerson):
    pass

class Undergrad(Student):
    pass

class Grad(Student):
    pass
    
prof1 = Acad('Angelina', 'Jolie')

print(type(prof1))
print(isinstance(prof1, Acad))
print(isinstance(prof1, Staff))
print(isinstance(prof1, Person))
print(isinstance(prof1, Student))

prof1.set_birthdate

<class '__main__.Acad'>
True
True
True
False


## Encapsulation and Information Hiding

![Encapsulation and infromation hiding](figs/encapsulation.png "Encapsulation and infromation hiding")

### Encapsulation

* The bundling of data attributes and the methods for operating on them

### Information hiding

* Allows changing the class definition without affecting its external behavior

### Encapsulation and information hiding keep class attributes and methods safe from outside interference and misuse.


## Information Hiding in Python

* Use naming conventions to make data attributes and methods invisible outside the class
* Convention: Begin name with `__` but do not end with it

In [36]:
class InfoHiding(object):
    def __init__(self):
        self.visible = 'Look at me'
        self.__visible__ = 'Look at me too'
        self.__invisible = 'Do not look at me directly'
        
    def print_visible(self):
        print(self.visible)
    
    def print_invisible(self):
        print(self.__invisible)
        
    def __invisible_print_invisible(self):
        print(self.__invisible)
        
    def __visible_print_invisible__(self):
        print(self.__invisible)

test = InfoHiding()

In [37]:
print(test.visible)
print(test.__visible__)
print(test.__invisible)

Look at me
Look at me too


AttributeError: 'InfoHiding' object has no attribute '__invisible'

In [16]:
test.print_visible()
test.print_invisible()
test.__visible_print_invisible__()
test.__invisible_print_invisible()


Look at me
Do not look at me directly
Do not look at me directly


AttributeError: 'InfoHiding' object has no attribute '__invisible_print_invisible'

## Information Hiding and Subclasses

In [17]:
class SubClass(InfoHiding):
    def __init__(self):
        InfoHiding.__init__(self)
        print(self.__invisible)

sub_test = SubClass()

AttributeError: 'SubClass' object has no attribute '_SubClass__invisible'

## In Practice, Information Hiding Convention in Python Is Rarely Used 

* Without it users may rely on attributes that are not necessarily part of the specification of the class
* Without it users may also change these attributes in undesirable ways

In [18]:
class Course(object):
    
    def __init__(self, student_list):
        self.students = student_list
        self.grades = {}
    
    def get_students():
        return self.students[:]
    
course1 = Course([1, 2, 3])
course2 = Course([4, 5, 6])

all_students = course1.students
all_students.extend(course2.students)

print(course1.students) # we changed course1 -- pointer problem (aka we aliased)


[1, 2, 3, 4, 5, 6]


## In Practice, Information Hiding in Python Requires Discipline!

* Do not directly access data attributes from outside the class in which they are defined
* Return copies of mutable objects rather than the objects themselves (e.g. lists or dictionaries)

#### Speed issues arise because of this duplicating the data everytime
--> `yield` can help !

In [19]:
class Course(object):
    
    def __init__(self, student_list):
        self.students = student_list
        self.grades = {}
    
    def get_students(self):
        return self.students[:] # Creates a copy of a list already in memory
    
    def add_grade(self, student, grade):
        self.grades[student] = grade
    
course1 = Course([i for i in range(1, 11)])
for i in course1.get_students():
    course1.add_grade(i, 100)
    
print(course1.get_students()) 

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


## Generators with `yield`

iterating over generators is much better than using copies of the whole data!

In [20]:
class Course(object):
    
    def __init__(self, student_list):
        self.students = student_list
        self.grades = {}
    
    def get_students(self):
        for i in self.students:
            yield i
    
    def add_grade(self, student, grade):
        self.grades[student] = grade
    
course1 = Course([i for i in range(1, 11)])
for i in course1.get_students():
    course1.add_grade(i, 100)
    
print(course1.get_students())

<generator object Course.get_students at 0x103e08830>


## When to Use Classes

* Methods require look-up so are a bit slower than functions

In [21]:
Person.__dict__

mappingproxy({'__dict__': <attribute '__dict__' of 'Person' objects>,
              '__doc__': None,
              '__init__': <function __main__.Person.__init__>,
              '__lt__': <function __main__.Person.__lt__>,
              '__module__': '__main__',
              '__str__': <function __main__.Person.__str__>,
              '__weakref__': <attribute '__weakref__' of 'Person' objects>,
              'get_age': <function __main__.Person.get_age>,
              'get_name': <function __main__.Person.get_name>,
              'set_birthdate': <function __main__.Person.set_birthdate>})

In [43]:
x = 1
del x
x

NameError: name 'x' is not defined

* Designing classes properly can be quite time-consuming

* In general, data scientists are less likely to implement classes
* However, most modules and packages that data scientists use make heavy use of object-oriented programming

* **If you are building a reusable and extendable code to share with others and/or release publicly, use classes!**

## Classes in Python

* Reusable abstractions
* Reduce development time for large projects
* Allow to maintain and update programs without disruptions for users
* Help produce more reliable programs
* Essential for developing user applications

-------

* **Lab**: Collaborative programming
* **Next week**: No lecture or class! I am not available for regular office hours but e-mail me to arrange time to meet.

## Collaborative Programming

* Most programming is collaborative
* Functions and classes allow us to:
  1. Design programs
  * Divide work
  * Write code simultaneously
  * Merge contributions  
* The next two weekly assignments will be done in groups of two
* Pairs will be formed randomly

## Work on Assignment 5 in Pairs

1. Wait for an e-mail with your team name
* Go to the assignment link in the e-mail
* If you see your team, join it
* If you don't see your team, create it

## Let's say you drew the following team name:

# team-0

## If you see your team in the list of current teams, click "Join"

![Choose team for assignment](figs/choose_team.png "Choose team for assignment")

## If you don't see your team in the list of current teams, enter its name and click "Create team"

![If team does not exist yet, create it](figs/create_team.png "If team does not exist yet, create it")

## If successful, you and your partner are given access to the same remote repository **assignment-5-team-0**

![Team repository](figs/team_repo.png "Team repository")

## What Happens Next

* Each of you should clone the team repository locally
* Coordinate how to divide the labor
* Work separately but use GitHub to open issues and pull requests
* Merge your contributions

### We will discuss how to use GitHub for collaboration in class

---

### MY470 Computer Programming
# Useful Python Library: `networkx`
### Week 5 Extra

## Background

![NetworkX](figs/networkx.png "NetworkX")

* Library for studying networks
* The major data structures are of the type "dictionary of dictionaries"
* Capabilities:
    * Estimate common network measures
    * Construct random networks
    * Visualize networks
    * Convert networks to and from different formats

## Creating Networks

In [23]:
import networkx as nx

# Create and empty network
G = nx.Graph()  
# Add nodes to network
G.add_node(1)
G.add_nodes_from([2,3])
print(G.nodes())
G.nodes()

[1, 2, 3]


NodeView((1, 2, 3))

In [24]:
# Add edges to network (nodes are automatically added if they don't exist already)
G.add_edge(1, 2)
G.add_edges_from([(2, 3), (1, 3), (1, 4)])
print(G.edges())
print(G.nodes())

[(1, 2), (1, 3), (1, 4), (2, 3)]
[1, 2, 3, 4]


## Node and Edge Attributes

In [25]:
# Add/modify node attributes
G.nodes[1]['name'] = 'Anna'
G.add_node(5, name = 'Elliot')
G.add_nodes_from([6, 7], name = 'Fathima')
G.nodes.data()

NodeDataView({1: {'name': 'Anna'}, 2: {}, 3: {}, 4: {}, 5: {'name': 'Elliot'}, 6: {'name': 'Fathima'}, 7: {'name': 'Fathima'}})

In [26]:
# Add/modify edge attributes
G.add_edge(5, 6, weight=2 )
G.add_edges_from([(1, 2), (1, 3)], weight=1)
G[1][2]['weight'] = 1.7
G.edges[2, 3]['weight'] = 1.5
G.edges.data()

EdgeDataView([(1, 2, {'weight': 1.7}), (1, 3, {'weight': 1}), (1, 4, {}), (2, 3, {'weight': 1.5}), (5, 6, {'weight': 2})])

## Analyzing Networks

In [27]:
# Estimate node degrees -- returns a DegreeView object capable of iterating (node, degree) pairs
G.degree()

DegreeView({1: 3, 2: 2, 3: 2, 4: 1, 5: 1, 6: 1, 7: 0})

In [28]:
# Estimate node clustering
# The method returns dictionary -- it's good example about the importance of consistency!
nx.clustering(G)

{1: 0.3333333333333333, 2: 1.0, 3: 1.0, 4: 0, 5: 0, 6: 0, 7: 0}

In [29]:
# Identify the connected subcomponents in the network -- returns a generator!
list(nx.connected_components(G))

[{1, 2, 3, 4}, {5, 6}, {7}]

* Despite these inconsistencies, `networkx` is a powerful library for network analysis (although not so much for visualization)
* When in doubt or confused, simply consult the documentation!

## Resources

* Get started: [NetworkX tutorial](https://networkx.github.io/documentation/stable/tutorial.html)
* Get inspired: [NetworkX examples](https://networkx.github.io/documentation/stable/auto_examples/index.html)
* Get it done: [NetworkX reference](https://networkx.github.io/documentation/stable/reference/index.html)