<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Agenda" data-toc-modified-id="Agenda-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Agenda</a></span></li><li><span><a href="#Why-a-data-scientist-should-learn-about-OOP" data-toc-modified-id="Why-a-data-scientist-should-learn-about-OOP-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Why a data scientist should learn about OOP</a></span></li><li><span><a href="#&quot;Everything-in-Python-is-an-object&quot;" data-toc-modified-id="&quot;Everything-in-Python-is-an-object&quot;-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>"Everything in Python is an object"</a></span><ul class="toc-item"><li><span><a href="#Side-Note-about-Variables" data-toc-modified-id="Side-Note-about-Variables-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Side Note about Variables</a></span></li></ul></li><li><span><a href="#Define-attributes,-methods,-and-dot-notation" data-toc-modified-id="Define-attributes,-methods,-and-dot-notation-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Define attributes, methods, and dot notation</a></span><ul class="toc-item"><li><span><a href="#Exercise" data-toc-modified-id="Exercise-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Exercise</a></span></li></ul></li><li><span><a href="#Describe-the-relationship-of-classes-to-objects,-and-learn-to-code-classes" data-toc-modified-id="Describe-the-relationship-of-classes-to-objects,-and-learn-to-code-classes-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Describe the relationship of classes to objects, and learn to code classes</a></span><ul class="toc-item"><li><span><a href="#Classes" data-toc-modified-id="Classes-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Classes</a></span></li><li><span><a href="#Methods" data-toc-modified-id="Methods-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Methods</a></span></li></ul></li></ul></div>

![fvo](https://cdn.educba.com/academy/wp-content/uploads/2018/07/Functional-Programming-vs-OOP-1.png)

# Object-Oriented Programming

In [1]:
import pandas as pd
import inspect

>inspect is a tool for examining objects we build

## Agenda

SWBAT:

1. explain the meaning and relevance of object orientation;
2. explain the idea that "everything in Python is an object";
3. define the notions of attribute, method, and dot notation;
4. describe the relationship of classes and objects, and to code classes;
5. explain the notion of inheritance;
6. describe how the object structure is used in `sklearn` tools like `StandardScaler` and `OneHotEncoder`.

## Why a data scientist should learn about OOP

  - By becoming familiar with the principles of OOP, you will increase your knowledge of what's possible.  Much of what you might think you need to code by hand is already built into the objects.
  - With a knowledge of classes and how objects store information, you will develop a better sense of when the learning in machine learning occurs in the code, and after that learning occurs, how to access the information gained.
  - You become comfortable reading other people's code, which will improve your own code.
  - You will develop knowledge of the OOP family of programming languages, the strengths and weakness of Python, and the strengths and weaknesses of other language families.

Let's begin by taking a look at the source code for `sklearn`'s [StandardScaler](https://github.com/scikit-learn/scikit-learn/blob/fd237278e/sklearn/preprocessing/_data.py#L517)

Take a minute to peruse the source code on your own. What do you notice?

In [2]:
#Embedded functions
#'self' keyword
#heirarchical structure
#long docstring

>self refers to the particular object it is instantiating

>StandardScaler is a class, and we automatically pass into it an instantiated obj


## "Everything in Python is an object"

Python is an object-oriented programming language. You'll hear people say that "everything is an object" in Python. What does this mean?

Go back to the idea of a function for a moment. A function is a kind of abstraction whereby an algorithm is made repeatable. So instead of coding:

In [3]:
print(3**2 + 10)
print(4**2 + 10)
print(5**2 + 10)

19
26
35


or even:

>x is an abstraction

In [4]:
for x in range(3, 6):
    print(x**2 + 10)

19
26
35


I can write:

In [5]:
def square_and_add_ten(x):
    return x**2 + 10

>Will give me an output as long as I pass in an input. It is a really useful abstraction.

Now imagine a further abstraction: Before, creating a function was about making a certain algorithm available to different inputs. Now I want to make that function available to different **objects**.

Even Python integers are objects. Consider:

In [6]:
x = 3

We can see what type of object a variable is with the built-in type operator:

In [7]:
type(x)

int

By setting x equal to an integer, I'm imbuing x with the methods of the integer class.

>Whatever is available to integer class is avail to x because it is an int

In [8]:
x.bit_length()

2

In [9]:
y = 4
y.bit_length()

3

In [10]:
x.__float__()

3.0

Python is dynamically typed, meaning you don't have to instruct it as to what type of object your variable is.  
A variable is a pointer to where an object is stored in memory.

### Side Note about Variables

In [11]:
id(x)

4378651040

>id is base10, hex(id) is base16

In [12]:
hex(id(x))

'0x104fce9a0'

In [13]:
y = 3

In [14]:
hex(id(y))

'0x104fce9a0'

In [15]:
x is y

True

>Because these have the same location in memory there are things that you might want to change

In [16]:
# this can have implications 

x_list = [1,2,3,4]
y_list = x_list

x_list.pop()
print(x_list)
print(y_list)

[1, 2, 3]
[1, 2, 3]


>ylist is the same obj as xlist above

In [17]:
# when you use copy(), you create a shallow copy of the object

z_list = y_list.copy()

In [18]:
id(z_list)

140569916042880

In [19]:
id(y_list)

140569934357056

>Note the different ids above!

In [20]:
y_list.pop()
print(y_list)
print(z_list)

[1, 2]
[1, 2, 3]


>copy allows us to have one copy to make changes to without changing the original

>When things get more complicated and objects have deep structure, you have to use deepcopy!!

In [21]:
a_list = [[1,2,3], [4,5,6]]
b_list = a_list.copy()
a_list[0][0] ='z'
b_list

[['z', 2, 3], [4, 5, 6]]

In [23]:
print(a_list)

[['z', 2, 3], [4, 5, 6]]


In [24]:
import copy

# deepcopy is needed for mutable objects

a_list = [[1,2,3], [4,5,6]]
b_list = copy.deepcopy(a_list)
a_list[0][0] ='z'
b_list

[[1, 2, 3], [4, 5, 6]]

In [25]:
print(a_list)

[['z', 2, 3], [4, 5, 6]]


For more details on this general feature of Python, see [here](https://jakevdp.github.io/WhirlwindTourOfPython/03-semantics-variables.html).
For more on shallow and deep copying, go [here](https://docs.python.org/3/library/copy.html#copy.deepcopy).

## Define attributes, methods, and dot notation

Dot notation is used to access both attributes and methods.

Take for example our familiar friend, the [`Pandas` DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html).

In [26]:
# Dataframes are another type of object.

df = pd.DataFrame({'price': [50, 40, 30],'sqft': [1000, 950, 500]})

In [27]:
df

Unnamed: 0,price,sqft
0,50,1000
1,40,950
2,30,500


In [28]:
type(df)

pandas.core.frame.DataFrame

Instance attributes are associated with each unique object.
They describe characteristics of the object, and are accessed with dot notation like so:

In [29]:
df.shape

(3, 2)

What are some other DataFrame attributes we know?:

In [30]:
# Other df attributes

#.dtypes
#.loc
#.values
#.columns
#.index

A **method** is a function attached to an object: **Methods have parenthesis**

In [31]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   price   3 non-null      int64
 1   sqft    3 non-null      int64
dtypes: int64(2)
memory usage: 176.0 bytes


In [32]:
type(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   price   3 non-null      int64
 1   sqft    3 non-null      int64
dtypes: int64(2)
memory usage: 176.0 bytes


NoneType

In [33]:
# isna() is a method that comes along with the DataFrame object

df.isna()

Unnamed: 0,price,sqft
0,False,False
1,False,False
2,False,False


What other DataFrame methods do we know?

In [34]:
# Other df methods

#.head()
#.value_counts()
#.sort_index()
#.tail()
#.sort_values()
#.groupby()

### Exercise

Let's practice accessing the methods associated with the built in `str` class.  
You are given a string below: 

In [35]:
example = '   hELL0, w0RLD?   '

Your task is to fix is so it reads `Hello, World!` using string methods.  To practice chaining methods, try to do it in one line.

Use the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods), and use the inspect library to see the names of methods.

We can chain methods together because the **result of applying a method to an object is another object**.

In [36]:
inspect.getmembers(example)

[('__add__', <method-wrapper '__add__' of str object at 0x7fd8fd03f990>),
 ('__class__', str),
 ('__contains__',
  <method-wrapper '__contains__' of str object at 0x7fd8fd03f990>),
 ('__delattr__',
  <method-wrapper '__delattr__' of str object at 0x7fd8fd03f990>),
 ('__dir__', <function str.__dir__()>),
 ('__doc__',
  "str(object='') -> str\nstr(bytes_or_buffer[, encoding[, errors]]) -> str\n\nCreate a new string object from the given object. If encoding or\nerrors is specified, then the object must expose a data buffer\nthat will be decoded using the given encoding and error handler.\nOtherwise, returns the result of object.__str__() (if defined)\nor repr(object).\nencoding defaults to sys.getdefaultencoding().\nerrors defaults to 'strict'."),
 ('__eq__', <method-wrapper '__eq__' of str object at 0x7fd8fd03f990>),
 ('__format__', <function str.__format__(format_spec, /)>),
 ('__ge__', <method-wrapper '__ge__' of str object at 0x7fd8fd03f990>),
 ('__getattribute__',
  <method-wrapper '

In [37]:
# we can also use the built-in dir() method

dir(example)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


In [38]:
example

'   hELL0, w0RLD?   '

In [43]:
example.strip().lower().replace('?', '!').replace('0', 'o').title()

'Hello, World!'

<details>
    <summary>
        Answer here
    </summary>
<code>example.swapcase().replace('0', 'o').strip().replace('?', '!')</code>
    </details>

> We can chain methods together because the type of the object does not change after each chain. We are technically giving each time a new method of the string class above

## Describe the relationship of classes to objects, and learn to code classes

Each object is an instance of a **class** that defines a bundle of attributes and functions (now, as proprietary to the object type, called *methods*), the point being that **every object of that class will automatically have those proprietary attributes and methods**.

A class is like a blueprint that describes how to create a specific type of object.

![blueprint](img/blueprint.jpeg)

### Classes

We can define **new** classes of objects altogether by using the keyword `class`:

In [46]:
class Car:
    """Automotive object"""
    pass # This is called a stub.

In [47]:
# Instantiate a car object

ferrari = Car()
type(ferrari)

__main__.Car

In [48]:
# We can give the Ferrari four wheels

ferrari.wheels = 4
ferrari.wheels

4

> The example of wheels above only applies to the ferrari, if you define it in the class(see below) it will do it for all instances of the class

But wouldn't it be nice not to have to do that every time? We'll just include the 4-wheels specification in the blueprint!

In [50]:
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.

In [51]:
civic = Car()
civic.wheels

4

In [52]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

In [53]:
ferrari = Car()
ferrari.doors

4

In [54]:
ferrari.wheels

4

In [55]:
# Does your Ferrari have only 2 doors? 
# These attributes can be overwritten.

ferrari.doors = 2
ferrari.doors

2

### Methods

We can also write functions that are associated with each class.  
As said above, a function associated with a class is called a method.

In [56]:
#  Then we can add more attributes
class Car:
    """Automotive object"""
    
    wheels = 4                      # These are attributes of *every* car.
    doors = 4

    def honk(self):                   # These are methods we can call on *any* car.
        print('Beep beep')

In [57]:
ferrari = civic = Car()
ferrari.honk()
civic.honk()

Beep beep
Beep beep


In [58]:
type(ferrari.wheels)

int

In [59]:
type(ferrari.honk())

Beep beep


NoneType

() will call the method. Because there is no return the type of the obj is a method.
Below the honk is asking what it is without calling it.

In [60]:
type(ferrari.honk)

method

>Self should be the first parameter in a method. You don't have to pass it, it will go in by default, but it has to be in there.

> Normally a method returns an object to be utilized