# Object-Oriented Programming (OOP) in Python

## Class and Instance

**Object-Oriented Programming (OOP)** is a way of organizing code by grouping related data and actions together into *objects*.
- Class: A blueprint of creating objects (cookie-cutter)
- Instance: A specific object creating from a class (cookie)

In [1]:
# Naming convention: Class name should be uppercase.
class Dog:
    pass # this is a placeholder

In [2]:
# Naming convention: Instance name should be lowercase.
dog1 = Dog()
dog2 = Dog()

The class type can be checked as follows.

In [3]:
print(type(dog1))
print(type(dog2))

<class '__main__.Dog'>
<class '__main__.Dog'>


Python classes can have **attributes** (information, data) and **methods** (action, function). Let's study one by one.

## Attributes

An **attribute** is a piece of information about an object. There are two types of attributes:
- class attribute: an attribute that is shared by all instances of class.
- instance attribute: an attribute that belongs to the specifid instance.

In python, the syntax is as follows:
```python
class CLASSNAME:
    CLASS_ATTRIBUTE = ...
    def __init__(self, ARGUMENT_1, ARGUMENT_2, ...):
        self.INSTANCE_ATTRIBUTE_1 = ARGUMENT_1
        self.INSTANCE_ATTRIBUTE_2 = ARGUMENT_2
        ...
```
Here, `self` means **this specific object**. It is how the object refers to itself inside the class. It must be the first parameter of any method that belongs to the class.

In [4]:
class Dog:
    is_animal = True # class attribute
    scientific_name = "Canis lupus familiaris" # class attribute
    def __init__(self, name, age, breed): # __init__ is a special method called constructor
        self.name = name # instance attributes
        self.age = age # instance attributes
        self.breed = breed # instance attributes

In [5]:
mydog = Dog(name = "Pepper", age = 3, breed = "Poodle")
yourdog = Dog(name = "Tofu", age = 2, breed = "Chihuahua")

#### Accessing attributes
You can access to attribute by `instance.attribute_name`. Note that we don't have any parenthesis at the end.

In [6]:
# Class attributes
print(mydog.scientific_name)
print(yourdog.scientific_name)

Canis lupus familiaris
Canis lupus familiaris


In [7]:
# Instance attributes
print(mydog.name, mydog.age, mydog.breed)
print(yourdog.name, yourdog.age, yourdog.breed)

Pepper 3 Poodle
Tofu 2 Chihuahua


#### Keyword vs. Positional

In [8]:
somedog = Dog("Buddy", 5, "Beagle")
print(somedog.name, somedog.age, somedog.breed)

Buddy 5 Beagle


In [9]:
somedog = Dog(5, "Buddy", "Beagle")
print(somedog.name, somedog.age, somedog.breed)

5 Buddy Beagle


In [10]:
somedog = Dog(age = 5, name = "Buddy", breed = "Beagle")
print(somedog.name, somedog.age, somedog.breed)

Buddy 5 Beagle


In [11]:
# (Warning!) Note that positional arguments must come before keyword arguments.
#  This will raise an error:
somedog = Dog(age =5, "Buddy", breed = "Beagle")

SyntaxError: positional argument follows keyword argument (2499983003.py, line 3)

In [12]:
# This is okay:
somedog = Dog("Buddy", breed = "Beagle", age =5)
print(somedog.name, somedog.age, somedog.breed)

Buddy 5 Beagle


#### Optional instance attributes

In [13]:
class Dog:
    scientific_name = "Canis lupus familiaris" # class attribute
    def __init__(self, name, age, breed, has_owner = True): # __init__ is a special method called constructor
        self.name = name # instance attributes
        self.age = age # instance attributes
        self.breed = breed # instance attributes
        self.has_owner = has_owner # instance attributes

In [14]:
mydog = Dog("Pepper", 3, "Poodle")
print(mydog.name, mydog.age, mydog.breed, mydog.has_owner)

Pepper 3 Poodle True


In [15]:
unknowndog = Dog("Unknown", 4, "Golden Retriever", has_owner = False)
print(unknowndog.name, unknowndog.age, unknowndog.breed, unknowndog.has_owner)

Unknown 4 Golden Retriever False


Note that any attributes without default values must be assigned.

In [16]:
somedog = Dog("Pepper", 3)

TypeError: Dog.__init__() missing 1 required positional argument: 'breed'

## Methods
A method is a function that is defined inside a class and is called on an object (instance) of that class.
```python
class ClassName:
    def method_name(self, other_arguments):
        # code block
        ...
```
Again, `self` must be the first argument. The attributes can be called by `self.ATTRIBUTE`.

In [17]:
class Dog:
    scientific_name = "Canis lupus familiaris" # class attribute
    def __init__(self, name, age, breed, has_owner = True): # __init__ is a special method called constructor
        self.name = name # instance attributes
        self.age = age # instance attributes
        self.breed = breed # instance attributes
        self.has_owner = has_owner # instance attributes

    def bark(self, sound, times=1):
        print(f"{self.name} says: {(sound + '! ') * times}")

In Python, methods should be called with parentheses `()` to execute them.

In [18]:
mydog = Dog("Pepper", 3, "Poodle")
mydog.bark("Woof")

Pepper says: Woof! 


In [19]:
yourdog = Dog("Tofu", 2, "Chihuahua")
yourdog.bark("Yip", 3)

Tofu says: Yip! Yip! Yip! 


Methods can be used to update the attributes.

In [20]:
class Dog:
    scientific_name = "Canis lupus familiaris" # class attribute
    def __init__(self, name, age, breed, has_owner = True): # __init__ is a special method called constructor
        self.name = name # instance attributes
        self.age = age # instance attributes
        self.breed = breed # instance attributes
        self.has_owner = has_owner # instance attributes

    def bark(self, sound, times=1):
        print(f"{self.name} says: {(sound + '! ') * times}")
    
    def abandon(self):  # <- method with no arguments
        print(f"{self.name} has been abandoned...")
        self.name = "Unknown"
        self.has_owner = False

In [21]:
yourdog = Dog("Tofu", 2, "Chihuahua")
yourdog.abandon()

Tofu has been abandoned...


In [22]:
print(yourdog.name, yourdog.has_owner)

Unknown False


## Subclass

- A **subclass** is a class that inherits from another class, called the **parent** or **superclass**.
- It reuses the parent’s code but can also add new attributes or methods.
- Subclasses help organize code by creating specialized versions of more general classes.

Python syntax for creating a subclass:
```python
class SubClassName(ParentClassName):
    SUBCLASS_CLASS_ATTIBUTES = ...

    def __init__(self, ...):
        super().__init__(...) # call parent class' constructor
        SUBCLASS_INSTANT_ATTRIBUTES = ... # subclass-specific initialization

    ... # additional methods
```

In [23]:
class Poodle(Dog):
    available_styles = [
        "Continental Clip",
        "English Saddle Clip",
        "Puppy Clip",
        "Lamb Clip",
        "Sporting Clip"
    ] # class attribute
    def __init__(self, name, age, style="Puppy Clip", has_owner=True):
        super().__init__(name, age, breed="Poodle", has_owner=has_owner)
        self.style = style  # instance attribute

    def change_style(self, new_style):
        if new_style in Poodle.available_styles:
            print(f"{self.name}'s haircut is changed to '{new_style}' style.")
            self.style = new_style
        else:
            print(f"Style '{new_style}' is not available.")

- `super()` is used to call a method from the parent class. 
- In this case, `super().__init__()` lets the `Poodle` class reuse the `__init__()` method from `Dog`, so we don’t have to rewrite code for `name`, `age`, and `has_owner`.

In [24]:
mydog = Poodle("Pepper", 3, style="Continental Clip")
print(mydog.breed)

Poodle


In [25]:
print(type(mydog))
print(isinstance(mydog, Poodle))
print(isinstance(mydog, Dog)) # isinstance() works with inheritance

<class '__main__.Poodle'>
True
True


In [26]:
mydog.change_style("Lamb Clip")
print(mydog.style)

Pepper's haircut is changed to 'Lamb Clip' style.
Lamb Clip


In [27]:
mydog.change_style("Poodle Noodle")
print(mydog.style)

Style 'Poodle Noodle' is not available.
Lamb Clip


In [28]:
mydog.bark("Woof", 2) # we can still use parent class methods

Pepper says: Woof! Woof! 


In [29]:
yourdog = Dog("Tofu", 2, "Chihuahua")
yourdog.change_style("Lamb Clip") # this will raise an error since Dog class has no method change_style()

AttributeError: 'Dog' object has no attribute 'change_style'

## More...
Some topics not covered about class:
- Private attributes
- Special methods (`__str__`, `__repr__`)
- Decorator
- ...

---

# Real Examples (`pandas`)

Basically, each Python library defines one or more **core** classes that represent its core functionality, along with additional utility functions that support or extend those core features. 

Let’s take a closer look at the `pandas` library from this point of view.

**Core classes**
- `Index`: Class that labels rows or columns
- `Series`: 1D labeled array
- `DataFrame`: 2D labeled array

More specifically, the class hierarchy in pandas can be summarized as follows:

```text
NDFrame: collection of data
 ├── Series: 1D array
 └── DataFrame: 2D array (two Index classes (row/column) and multiple Series classes)

 Index: labels used to identify rows or columns
 ├── RangeIndex
 ├── DatetimeIndex
 ├── MultiIndex
 └── ...
 ```

In [30]:
import pandas as pd

### `DataFrame`

In [31]:
s = pd.Series(data = [1,2,3], index = ['a','b','c'])

In [32]:
print(type(s))

<class 'pandas.core.series.Series'>


In [33]:
print(s.index)
print(type(s.index))

Index(['a', 'b', 'c'], dtype='object')
<class 'pandas.core.indexes.base.Index'>


In [34]:
df = pd.DataFrame(data = {"A": [1,2,3], "B": [4,5,6]})

In [35]:
print(type(df))

<class 'pandas.core.frame.DataFrame'>


In [36]:
print("## Rows")
print(df.index)
print(type(df.index))

print("-"*50)

print("## Columns")
print(df.columns)
print(type(df.columns))

## Rows
RangeIndex(start=0, stop=3, step=1)
<class 'pandas.core.indexes.range.RangeIndex'>
--------------------------------------------------
## Columns
Index(['A', 'B'], dtype='object')
<class 'pandas.core.indexes.base.Index'>


#### Attributes and methods of `DataFrame`

Class attibutes

In [37]:
df.ndim # number of dimensions

2

Instance attributes

In [38]:
df.shape # shape of the DataFrame

(3, 2)

In [39]:
df.size # number of elements in the DataFrame

6

In [40]:
df.dtypes # data types of each column

A    int64
B    int64
dtype: object

In [41]:
df.values # values in the DataFrame as a numpy array

array([[1, 4],
       [2, 5],
       [3, 6]])

Methods

In [42]:
df.head() # first 5 rows

Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


In [43]:
df.tail(1) # last 1 rows

Unnamed: 0,A,B
2,3,6


In [44]:
df.describe() # summary statistics

Unnamed: 0,A,B
count,3.0,3.0
mean,2.0,5.0
std,1.0,1.0
min,1.0,4.0
25%,1.5,4.5
50%,2.0,5.0
75%,2.5,5.5
max,3.0,6.0


In [45]:
df.mean() # mean of each column

A    2.0
B    5.0
dtype: float64

The following is a *special* method.

In [46]:
df["A"] # column A as a Series

0    1
1    2
2    3
Name: A, dtype: int64

#### Utility functions
Python libraries have their own utility functions. They are not directly related to the classes in the library. 

- If we import library only, then the utility functions can be used by `LIBRARY_NAME.UTIL_FUNCTION_NAME(ARGUMENTS)`.
- Or, we can also import util function directly by `from LIBRARY_NAME import UTIL_FUNCTION_NAME`. In this case, you can use util function simply by `UTIL_FUNCTION_NAME(ARGUMENTS)`.

In `pandas`, we have `pd.to_datetime()`, `pd.to_numeric()`, `pd.isnull()`, etc..

You can also view these I/O functions as util: `pd.read_csv()`, `pd.to_csv()`.

In [47]:
import pandas as pd

In [48]:
pd.to_datetime("2023-01-01")

Timestamp('2023-01-01 00:00:00')

In [49]:
df = pd.DataFrame({"A": ["1", "2", "3"], 
                   "B": ["4.0", "5.5", "6.1"]})
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       3 non-null      object
 1   B       3 non-null      object
dtypes: object(2)
memory usage: 180.0+ bytes


In [50]:
from pandas import to_numeric
df["A"] = to_numeric(df["A"])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   A       3 non-null      int64 
 1   B       3 non-null      object
dtypes: int64(1), object(1)
memory usage: 180.0+ bytes


In [51]:
# You can still use pd.to_numeric
df["B"] = pd.to_numeric(df["B"])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       3 non-null      int64  
 1   B       3 non-null      float64
dtypes: float64(1), int64(1)
memory usage: 180.0 bytes


In [52]:
# You get an error for the following:
df = pd.DataFrame({"A": ["1", "2", "3"], 
                   "B": ["4.0", "5.5", "6.1"],
                   "C": ["2", "3", "4"]})
df = to_numeric(df)
# Why?

TypeError: arg must be a list, tuple, 1-d array, or Series

You don’t need to memorize everything (I mean you **CAN'T**)! What truly matters is learning:
- how to search for the commands you need
- how to understand the documentation
- how to handle errors.