# 再探 Pandas

## 郭耀仁

## 解構 Pandas DataFrame

- `Pandas` 的 `DataFrame` 是由 `Series` 組成的
- 更底層其實是 `NumPy` array

```python
import pandas as pd

url = "https://storage.googleapis.com/py_ds_basic/iris.csv" # 在雲端上儲存了一份 csv 檔案
iris_df = pd.read_csv(url)
print(type(iris_df))
print(type(iris_df.Species))
print(type(iris_df.values))
```

## 解構 Pandas DataFrame（2）

- `.index` 是看觀測值的索引
- `.columns` 是看變數名稱
- `.values` 可以將 **ndarray** 顯示出來

```python
import pandas as pd

url = "https://storage.googleapis.com/py_ds_basic/iris.csv" # 在雲端上儲存了一份 csv 檔案
iris_df = pd.read_csv(url)
print(iris_df.index)
print(iris_df.columns)
print(iris_df.values)
```

## 解構 Pandas DataFrame（3）

- `.sort_index()` 可以依照觀測值或變數名稱排序
- `.sort_values()` 可以依照變數排序

```python
import pandas as pd

url = "https://storage.googleapis.com/py_ds_basic/iris.csv" # 在雲端上儲存了一份 csv 檔案
iris_df = pd.read_csv(url)
print(iris_df.sort_index(axis = 0, ascending = False))
print(iris_df.sort_index(axis = 1))
print(iris_df.sort_values(by = 'Sepal.Length'))
```

## 解構 Pandas DataFrame（4）

- 練習解構 `straw_hat_df`
- 用 `age` 遞減排序 `straw_hat_df`

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
is_male = [True, True, False, True, True, True, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
```

## 再探索引值選擇

- `.iloc()`：照位置看
- `.loc()`：照索引值
- `.ix()`：索引值在混合情況下，索引值為整數時如 `.iloc()`，為文字時如 `.loc()`

## 再探索引值選擇（2）

- 用例子講比較好懂！

```python
import pandas as pd
import numpy as np

s = pd.Series(np.zeros(10), index = [49,48,47,46,45, 1, 2, 3, 4, 5])
print(s)
print(s.iloc[:3])
print(s.loc[:3])
print(s.ix[:3])
```

## 再探索引值選擇（3）

```python
import pandas as pd
import numpy as np

s = pd.Series(np.zeros(10), index = [49,48,47,46,45, 1, 2, 3, 4, 5])
print(s)
print(s.iloc[:6])
print(s.loc[:6]) # KeyError: 6
print(s.ix[:6]) # KeyError: 6
```

## 再探索引值選擇（4）

```python
import pandas as pd
import numpy as np

s = pd.Series(np.zeros(10), index = ["a", "b", "c", "d", "e", 1, 2, 3, 4, 5])
print(s)
print(s.ix[:6])
print(s.ix[:"d"])
```

## 再探索引值選擇（5）

- 體驗一下 `.iloc()`、`.loc()` 與 `.ix()` 的不同

## 處理遺漏值

- 利用 `numpy.NaN` 建立遺漏值

```python
import pandas as pd # 引用套件並縮寫為 pd
import numpy as np

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook", np.NaN]
age = [19, 21, 20, 19, 21, 17, 30, 36, np.NaN, np.NaN]
is_male = [True, True, False, True, True, np.NaN, False, True, True, np.NaN]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
straw_hat_df
```

## 處理遺漏值（2）

- 練習利用 `.dropna()` 刪去遺漏值
- `how` 參數可以指定 `all` 或 `any` 試試看

## 處理遺漏值（3）

- 練習利用 `.fillna()` 填補遺漏值

```python
import pandas as pd # 引用套件並縮寫為 pd
import numpy as np

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, np.NaN]
is_male = [True, True, False, True, True, np.NaN, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
```

## 檢視相異值個數

- `.value_counts()`

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
is_male = [True, True, False, True, True, True, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
straw_hat_df.is_male.value_counts()
```

## 合併

- `pandas.concat()` 方法
- 練習將 `straw_hat_sub_df_1` 與 `straw_hat_sub_df_2` 水平合併起來

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
is_male = [True, True, False, True, True, True, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)

straw_hat_sub_df_1 = straw_hat_df.ix[:, "age":"is_male"]
straw_hat_sub_df_2 = straw_hat_df.ix[:, "name"]
pd.concat([___, ___], axis = 1)
```

## 合併（2）

- `pandas.concat()` 方法
- 練習將 `straw_hat_sub_df_1` 與 `straw_hat_sub_df_2` 垂直合併起來

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
is_male = [True, True, False, True, True, True, False, True, True]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

straw_hat_df = pd.DataFrame(straw_hat_dict)

straw_hat_sub_df_1 = straw_hat_df.ix[:4, :]
straw_hat_sub_df_2 = straw_hat_df.ix[5:, :]
pd.concat([___, ___])
```

## 合併（3）

- Inner Join

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
straw_hat_dict = {
    "name": name,
    "age": age
}

name = ["Monkey D. Luffy", "Tony Tony Chopper", "Nico Robin", "Brook", "Trafalgar D. Water Law"]
devil_fruit = ["Gum-Gum Fruit", "Human-Human Fruit", "Hana-Hana Fruit", "Revive-Revive Fruit", "Op-Op Fruit"]
devil_fruit_dict = {
    "name": name,
    "devil_fruit": devil_fruit
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
devil_fruit_df = pd.DataFrame(devil_fruit_dict)
inner_joined = pd.merge(straw_hat_df, devil_fruit_df)
inner_joined
```

## 合併（4）

- Left Outer Join

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
straw_hat_dict = {
    "name": name,
    "age": age
}

name = ["Monkey D. Luffy", "Tony Tony Chopper", "Nico Robin", "Brook", "Trafalgar D. Water Law"]
devil_fruit = ["Gum-Gum Fruit", "Human-Human Fruit", "Hana-Hana Fruit", "Revive-Revive Fruit", "Op-Op Fruit"]
devil_fruit_dict = {
    "name": name,
    "devil_fruit": devil_fruit
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
devil_fruit_df = pd.DataFrame(devil_fruit_dict)
left_joined = pd.merge(straw_hat_df, devil_fruit_df, how = "left")
left_joined
```

## 合併（5）

- Right Outer Join

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
straw_hat_dict = {
    "name": name,
    "age": age
}

name = ["Monkey D. Luffy", "Tony Tony Chopper", "Nico Robin", "Brook", "Trafalgar D. Water Law"]
devil_fruit = ["Gum-Gum Fruit", "Human-Human Fruit", "Hana-Hana Fruit", "Revive-Revive Fruit", "Op-Op Fruit"]
devil_fruit_dict = {
    "name": name,
    "devil_fruit": devil_fruit
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
devil_fruit_df = pd.DataFrame(devil_fruit_dict)
right_joined = pd.merge(straw_hat_df, devil_fruit_df, how = "right")
right_joined
```

## 合併（6）

- Full Outer Join

```python
import pandas as pd # 引用套件並縮寫為 pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
straw_hat_dict = {
    "name": name,
    "age": age
}

name = ["Monkey D. Luffy", "Tony Tony Chopper", "Nico Robin", "Brook", "Trafalgar D. Water Law"]
devil_fruit = ["Gum-Gum Fruit", "Human-Human Fruit", "Hana-Hana Fruit", "Revive-Revive Fruit", "Op-Op Fruit"]
devil_fruit_dict = {
    "name": name,
    "devil_fruit": devil_fruit
}

straw_hat_df = pd.DataFrame(straw_hat_dict)
devil_fruit_df = pd.DataFrame(devil_fruit_dict)
full_joined = pd.merge(straw_hat_df, devil_fruit_df, how = "outer")
full_joined
```