---
# Pandas Data Structures
Pandas has two main data structures for handling tabular data: `Series` and `DataFrames` 

---

In [10]:
# Importing pandas and numpy as pd, np, respectively.
# These shorthand names are common practice.
import pandas as pd
import numpy as np

# display() function works like print, but it shows the
# output in its rich display format if possible
from IPython.display import display

In [36]:
# Function for printing a horizontal line. For display purporse
def printhr(n: int = 50):
    """Print a horizontal rule of the character "=" of length n.

    Args:
        n (int, optional): Number of characters. Defaults to 50.
    """

    print("=" * n)

---
## Series

A series is a homogenous one-dimensional labeled array that can hold any data type (but only 1 type; homogenous). The axis labels are collectively referred to as the **index**.

---

---
### Creating a Series

You can use the Series class to create a Series.  

`new_series = pd.Series(data, index=index)`

Where **data** is the data to be converted into a Series, and **index** is iterable that contains the labels. If **index** is not set, default integer index will be set (incrementing from 0).

**data** can be of different structures:

- a Python dict
- a 1D NumPy ndarray
- a scalar value (like 38)

---


---
#### Python dict as Data

When passing in a Python dict, the keys will act as the labels. If an index is passed, it will be the size of the Series instead of the data's size. If an index is passed, it will check against the dict keys, and order the Series according to the order of the index. Values in the index that do not exist in the dict passed to data will be represented as NaN.

---


In [53]:
# Creating a Series by passing a dict
data = {"a": 2, "b": 4, "c": 6}
display(data)
printhr()

s = pd.Series(data)
display(s)

# > When an index does not exist in the provided dict data, it will
#   be represented with NaN. 
# > The length of the index is length of Series.
# > Order of index is followed.
s2 = pd.Series(data, index=["c", "a", "b", "x"])
display(s2)

{'a': 2, 'b': 4, 'c': 6}



a    2
b    4
c    6
dtype: int64

c    6.0
a    2.0
b    4.0
x    NaN
dtype: float64

In [40]:
# Series using a numpy ndarray
data = np.random.randn(5)
display(data)
printhr()

# Index in descending order
s = pd.Series(data, index=pd.RangeIndex(4, -1, -1))
display(s)

# If no index is passed, int is automatically assigned incrementally.
s2 = pd.Series(data)
display(s2)

array([ 1.12998975, -1.07812134,  0.26063663, -0.82503337, -0.53188494])



4    1.129990
3   -1.078121
2    0.260637
1   -0.825033
0   -0.531885
dtype: float64

0    1.129990
1   -1.078121
2    0.260637
3   -0.825033
4   -0.531885
dtype: float64

---
## DataFrame

A series is a homogenous one-dimensional labeled array that can hold any data type (but only 1 type; homogenous). The axis labels are collectively referred to as the **index**.

---

---
### Creating a DataFrame

Much like creating a Series, a DataFrame class can be used to create a DataFrame.  

`new_df = pd.DataFrame(data, index=index)`

**data** can be of different structures:

- a Python dict of any of the ff:
  - 1D ndarrays
  - lists
  - dicts
  - Series
- a 2D NumPy ndarray
- a Series
- structured or record ndarray
- another DataFrame


---


In [None]:
# Passing in a dict
data = {"Foo": [2, 5, 5], "Bar": [9, 2, 7]}
display(data)

s = pd.Series(data)
df = pd.DataFrame(data, index=range(3))

display(s, df)

In [None]:
# Passing in a dict with list as value
data = {"Foo": [2, 5, 5], "Bar": 9}
display(data)

s = pd.Series(data)
df = pd.DataFrame(data)

display(s, df)