<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Agenda" data-toc-modified-id="Agenda-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Agenda</a></span></li><li><span><a href="#Why-a-data-scientist-should-learn-about-OOP" data-toc-modified-id="Why-a-data-scientist-should-learn-about-OOP-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Why a data scientist should learn about OOP</a></span></li><li><span><a href="#&quot;Everything-in-Python-is-an-object&quot;" data-toc-modified-id="&quot;Everything-in-Python-is-an-object&quot;-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>"Everything in Python is an object"</a></span><ul class="toc-item"><li><span><a href="#Side-Note-about-Variables" data-toc-modified-id="Side-Note-about-Variables-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Side Note about Variables</a></span></li></ul></li><li><span><a href="#Define-attributes,-methods,-and-dot-notation" data-toc-modified-id="Define-attributes,-methods,-and-dot-notation-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Define attributes, methods, and dot notation</a></span><ul class="toc-item"><li><span><a href="#Exercise" data-toc-modified-id="Exercise-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li><li><span><a href="#Describe-the-relationship-of-classes-to-objects,-and-learn-to-code-classes" data-toc-modified-id="Describe-the-relationship-of-classes-to-objects,-and-learn-to-code-classes-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Describe the relationship of classes to objects, and learn to code classes</a></span><ul class="toc-item"><li><span><a href="#Classes" data-toc-modified-id="Classes-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Classes</a></span></li><li><span><a href="#Methods" data-toc-modified-id="Methods-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Methods</a></span></li></ul></li></ul></div>

![fvo](https://cdn.educba.com/academy/wp-content/uploads/2018/07/Functional-Programming-vs-OOP-1.png)

# Object-Oriented Programming

In [None]:
import pandas as pd
import inspect

## Agenda

SWBAT:

1. explain the meaning and relevance of object orientation;
2. explain the idea that "everything in Python is an object";
3. define the notions of attribute, method, and dot notation;
4. describe the relationship of classes and objects, and to code classes;
5. explain the notion of inheritance;
6. describe how the object structure is used in `sklearn` tools like `StandardScaler` and `OneHotEncoder`.

## Why a data scientist should learn about OOP

  - By becoming familiar with the principles of OOP, you will increase your knowledge of what's possible.  Much of what you might think you need to code by hand is already built into the objects.
  - With a knowledge of classes and how objects store information, you will develop a better sense of when the learning in machine learning occurs in the code, and after that learning occurs, how to access the information gained.
  - You become comfortable reading other people's code, which will improve your own code.
  - You will develop knowledge of the OOP family of programming languages, the strengths and weakness of Python, and the strengths and weaknesses of other language families.

Let's begin by taking a look at the source code for `sklearn`'s [StandardScaler](https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/preprocessing/_data.py#L517)

Take a minute to peruse the source code on your own. What do you notice?

## "Everything in Python is an object"

Python is an object-oriented programming language. You'll hear people say that "everything is an object" in Python. What does this mean?

Go back to the idea of a function for a moment. A function is a kind of abstraction whereby an algorithm is made repeatable. So instead of coding:

In [None]:
print(3**2 + 10)
print(4**2 + 10)
print(5**2 + 10)

or even:

In [None]:
for x in range(3, 6):
    print(x**2 + 10)

I can write:

In [None]:
def square_and_add_ten(x):
    return x**2 + 10

Now imagine a further abstraction: Before, creating a function was about making a certain algorithm available to different inputs. Now I want to make that function available to different **objects**.

Even Python integers are objects. Consider:

In [None]:
x = 3

We can see what type of object a variable is with the built-in type operator:

In [None]:
type(x)

By setting x equal to an integer, I'm imbuing x with the methods of the integer class.

In [None]:
x.bit_length()

In [None]:
y = 4
y.bit_length()

In [None]:
x.__float__()

Python is dynamically typed, meaning you don't have to instruct it as to what type of object your variable is.  
A variable is a pointer to where an object is stored in memory.

### Side Note about Variables

In [None]:
id(x)

In [None]:
hex(id(x))

In [None]:
y = 3

In [None]:
hex(id(y))

In [None]:
x is y

In [None]:
# this can have implications 

x_list = [1,2,3,4]
y_list = x_list

x_list.pop()
print(x_list)
print(y_list)

In [None]:
# when you use copy(), you create a shallow copy of the object

z_list = y_list.copy()

In [None]:
id(z_list)

In [None]:
id(y_list)

In [None]:
y_list.pop()
print(y_list)
print(z_list)

In [None]:
a_list = [[1,2,3], [4,5,6]]
b_list = a_list.copy()
a_list[0][0] ='z'
b_list

In [None]:
import copy

# deepcopy is needed for mutable objects

a_list = [[1,2,3], [4,5,6]]
b_list = copy.deepcopy(a_list)
a_list[0][0] ='z'
b_list

For more details on this general feature of Python, see [here](https://jakevdp.github.io/WhirlwindTourOfPython/03-semantics-variables.html).
For more on shallow and deep copying, go [here](https://docs.python.org/3/library/copy.html#copy.deepcopy).

## Define attributes, methods, and dot notation

Dot notation is used to access both attributes and methods.

Take for example our familiar friend, the [`Pandas` DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

In [None]:
# Dataframes are another type of object.

df = pd.DataFrame({'price': [50, 40, 30],'sqft': [1000, 950, 500]})

In [None]:
df

In [None]:
type(df)

Instance attributes are associated with each unique object.
They describe characteristics of the object, and are accessed with dot notation like so:

In [None]:
df.shape

What are some other DataFrame attributes we know?:

In [None]:
# Other df attributes



A **method** is a function attached to an object:

In [None]:
df.info()

In [None]:
type(df.info())

In [None]:
# isna() is a method that comes along with the DataFrame object

df.isna()

What other DataFrame methods do we know?

In [None]:
# Other df methods



### Exercise

Let's practice accessing the methods associated with the built in `str` class.  
You are given a string below: 

In [None]:
example = '   hELL0, w0RLD?   '

Your task is to fix is so it reads `Hello, World!` using string methods.  To practice chaining methods, try to do it in one line.

Use the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), and use the inspect library to see the names of methods.

We can chain methods together because the **result of applying a method to an object is another object**.

In [None]:
inspect.getmembers(example)

In [None]:
# we can also use the built-in dir() method

dir(example)

<details>
    <summary>
        Answer here
    </summary>
<code>example.swapcase().replace('0', 'o').strip().replace('?', '!')</code>
    </details>

## Describe the relationship of classes to objects, and learn to code classes

Each object is an instance of a **class** that defines a bundle of attributes and functions (now, as proprietary to the object type, called *methods*), the point being that **every object of that class will automatically have those proprietary attributes and methods**.

A class is like a blueprint that describes how to create a specific type of object.

![blueprint](img/blueprint.jpeg)

### Classes

We can define **new** classes of objects altogether by using the keyword `class`:

In [None]:
class Car:
    """Automotive object"""
    pass # This is called a stub.

In [None]:
# Instantiate a car object

ferrari = Car()
type(ferrari)

In [None]:
# We can give the Ferrari four wheels

ferrari.wheels = 4
ferrari.wheels

But wouldn't it be nice not to have to do that every time? We'll just include the 4-wheels specification in the blueprint!

In [None]:
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.

In [None]:
civic = Car()
civic.wheels

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

In [None]:
ferrari = Car()
ferrari.doors

In [None]:
ferrari.wheels

In [None]:
# Does your Ferrari have only 2 doors? 
# These attributes can be overwritten.

ferrari.doors = 2
ferrari.doors

### Methods

We can also write functions that are associated with each class.  
As said above, a function associated with a class is called a method.

In [None]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')

In [None]:
ferrari = civic = Car()
ferrari.honk()
civic.honk()

In [None]:
type(ferrari.wheels)

In [None]:
type(ferrari.honk())

Wait a second, what's that `self` doing? <br/> Every method should include `self` as its first parameter, **which refers to the individual object, i.e. to the instance of the class**.