# Pandas

## 郭耀仁

## 課程內容

- Pandas 提供的資料結構
    - Series
    - DataFrame
    - Panel
- 資料框的操作
- 資料載入

## Documentation

https://pandas.pydata.org/pandas-docs/stable/index.html

## 啟發自 R 語言

> Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more.

Source: <https://github.com/pandas-dev/pandas>

## Pandas 提供的資料結構

|名稱|描述|
|---|----|
|Series|可以建立索引的一維陣列|
|DataFrame|有列索引與欄標籤的二維資料集|
|Panel|有資料集索引、列索引與欄標籤的三維資料集|

# Series

## 建立 Series

- 用 `Series()` 建立 Series
- 其中 data 可以是：
    - 一個 ndarray 或 sequence
    - 一個 dict
    - 單一資料

```python
import pandas as pd

ser = pd.Series(data, index = idx)
```

## 建立 Series（2）

- 當 data 是一個 ndarray 或 sequence 的時候：

In [1]:
import numpy as np
import pandas as pd

arr = np.array(("Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"))
ser = pd.Series(arr)
print type(ser) 
print "\n"
print ser

<class 'pandas.core.series.Series'>


0      Monkey D. Luffy
1         Roronoa Zoro
2                 Nami
3                Usopp
4       Vinsmoke Sanji
5    Tony Tony Chopper
6           Nico Robin
7               Franky
8                Brook
dtype: object


In [4]:
# 使用自訂的索引
arr = np.array(("Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"))
crew_idx = []
for i in range(9):
    crew_idx.append("crew " + str(i + 1))
ser = pd.Series(arr, index = crew_idx)
print ser

crew 1      Monkey D. Luffy
crew 2         Roronoa Zoro
crew 3                 Nami
crew 4                Usopp
crew 5       Vinsmoke Sanji
crew 6    Tony Tony Chopper
crew 7           Nico Robin
crew 8               Franky
crew 9                Brook
dtype: object


## 建立 Series（3）

- 當 data 是一個 dict 的時候
- 將 key 當作索引值

In [19]:
crew_dict = {
    "captain": "Monkey D. Luffy",
    "swordsman": "Roronoa Zoro",
    "navigator": "Nami",
    "sniper": "Usopp",
    "chef": "Vinsmoke Sanji",
    "doctor": "Tony Tony Chopper",
    "archaeologist": "Nico Robin",
    "shipwright": "Franky",
    "musician": "Brook"
}

ser = pd.Series(crew_dict)
print ser

archaeologist           Nico Robin
captain            Monkey D. Luffy
chef                Vinsmoke Sanji
doctor           Tony Tony Chopper
musician                     Brook
navigator                     Nami
shipwright                  Franky
sniper                       Usopp
swordsman             Roronoa Zoro
dtype: object


## 建立 Series（4）

- 當 data 是當一資料的時候：

In [7]:
luffy = "Monkey D. Luffy"
ser = pd.Series(luffy, index = range(5))
print ser

0    Monkey D. Luffy
1    Monkey D. Luffy
2    Monkey D. Luffy
3    Monkey D. Luffy
4    Monkey D. Luffy
dtype: object


## 建立 Series（4）

- 練習建立一個 Series：

```python
starrings = ["Jennifer Aniston", "Courteney Cox", "Lisa Kudrow", "Matt LeBlanc", "Matthew Perry", "David Schwimmer"]
```

- 檢視它的 `.index` 與 `.values` 屬性

## Series 的操作

- 透過索引值或**標籤**選取資料

In [24]:
crew_dict = {
    "captain": "Monkey D. Luffy",
    "swordsman": "Roronoa Zoro",
    "navigator": "Nami",
    "sniper": "Usopp",
    "chef": "Vinsmoke Sanji",
    "doctor": "Tony Tony Chopper",
    "archaeologist": "Nico Robin",
    "shipwright": "Franky",
    "musician": "Brook"
}

ser = pd.Series(crew_dict)
print ser[0]
print "\n"
print ser['archaeologist']

Nico Robin


Nico Robin


In [25]:
print ser[[0, 3, 6]]
print "\n"
print ser[['archaeologist', 'doctor', 'shipwright']]

archaeologist           Nico Robin
doctor           Tony Tony Chopper
shipwright                  Franky
dtype: object


archaeologist           Nico Robin
doctor           Tony Tony Chopper
shipwright                  Franky
dtype: object


## Series 的操作（2）

- 透過 `:` 快速地切割

In [26]:
print(ser[:3])
print("\n")
print(ser['shipwright':])

archaeologist         Nico Robin
captain          Monkey D. Luffy
chef              Vinsmoke Sanji
dtype: object


shipwright          Franky
sniper               Usopp
swordsman     Roronoa Zoro
dtype: object


## Series 的操作（3）

- 也可以透過判斷條件進行布林篩選

In [28]:
name_filter = ser.isin(("Nami", "Nico Robin"))
print name_filter
print "\n"
print ser[name_filter]

archaeologist     True
captain          False
chef             False
doctor           False
musician         False
navigator         True
shipwright       False
sniper           False
swordsman        False
dtype: bool


archaeologist    Nico Robin
navigator              Nami
dtype: object


## Series 的操作（4）

- NumPy 的函數也都適用

In [30]:
crew_age = {
    "Monkey D. Luffy": 19,
    "Roronoa Zoro": 21,
    "Nami": 20,
    "Usopp": 19,
    "Vinsmoke Sanji": 21,
    "Tony Tony Chopper": 17,
    "Nico Robin": 30,
    "Franky": 36,
    "Brook": 90
}

ser = pd.Series(crew_age)
print "草帽海賊團的平均年齡：%.2f" % np.mean(ser)
print "草帽海賊團的年齡標準差：%.2f" % np.std(ser)

草帽海賊團的平均年齡：30.33
草帽海賊團的年齡標準差：21.88


## Series 的操作（5）

- 也適用 element-wise 運算

In [31]:
print ser - 2

Brook                88
Franky               34
Monkey D. Luffy      17
Nami                 18
Nico Robin           28
Roronoa Zoro         19
Tony Tony Chopper    15
Usopp                17
Vinsmoke Sanji       19
dtype: int64


# DataFrame

## 建立 DataFrame

- 用 `DataFrame()` 建立 DataFrame
- 其中 data 可以是：
    - 一個 dict（\*）
    - 一個 ndarray

```python
import pandas as pd

df = pd.DataFrame(data)
```

## 建立 DataFrame（2）

- 當 data 是一個 dict 的時候（\*）

In [32]:
straw_hat_dict = {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
                  "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
                  "is_male": [True, True, False, True, True, True, False, True, True]
}

df = pd.DataFrame(straw_hat_dict)
print type(df)
df

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,age,is_male,name
0,19,True,Monkey D. Luffy
1,21,True,Roronoa Zoro
2,20,False,Nami
3,19,True,Usopp
4,21,True,Vinsmoke Sanji
5,17,True,Tony Tony Chopper
6,30,False,Nico Robin
7,36,True,Franky
8,90,True,Brook


## 建立 DataFrame（4）

- 當 data 是一個 ndarray 的時候

In [40]:
arr = np.array([
    ["Monkey D. Luffy", 19, True],
    ["Roronoa Zoro", 21, True],
    ["Nami", 20, False],
    ["Usopp", 19, True],
    ["Vinsmoke Sanji", 21, True],
    ["Tony Tony Chopper", 17, True],
    ["Nico Robin", 30, False],
    ["Franky", 36, True],
    ["Brook", 90, True]
])
df = pd.DataFrame(arr, columns = ["name", "age", "is_male"])
df

Unnamed: 0,name,age,is_male
0,Monkey D. Luffy,19,True
1,Roronoa Zoro,21,True
2,Nami,20,False
3,Usopp,19,True
4,Vinsmoke Sanji,21,True
5,Tony Tony Chopper,17,True
6,Nico Robin,30,False
7,Franky,36,True
8,Brook,90,True


In [41]:
print(df.dtypes)
df['age'] = df['age'].astype(int)
df['is_male'] = df['is_male'].astype(bool) # pitfall here
print "\n"
print df.dtypes

name       object
age        object
is_male    object
dtype: object


name       object
age         int64
is_male      bool
dtype: object


In [38]:
df

Unnamed: 0,name,age,is_male
0,Monkey D. Luffy,19,True
1,Roronoa Zoro,21,True
2,Nami,20,True
3,Usopp,19,True
4,Vinsmoke Sanji,21,True
5,Tony Tony Chopper,17,True
6,Nico Robin,30,True
7,Franky,36,True
8,Brook,90,True


## 建立 DataFrame（5）

- 練習建立 DataFrame

```python
friends_dict = {
    "starrings": ["Jennifer Aniston", "Courteney Cox", "Lisa Kudrow", "Matt LeBlanc", "Matthew Perry", "David Schwimmer"],
    "characters": ["Rachel Green", "Monica Geller", "Phoebe Buffay", "Joey Tribbiani", "Chandler Bing", "Ross Geller"]
}
```

- 選出 `characters` 這個變數，檢視它的類型與 `.values` 屬性

# Panel

## 建立 Panel

- 相對比 Series、DataFrame 冷門
- 有三個維度：
    - items（資料框索引）
    - major_axis（資料框的列索引）
    - minor_axis（資料框的欄索引）

## 建立 Panel（2）

- 建立一個有兩個 DataFrame 的 Panel

In [46]:
df_2_yr_ago = pd.DataFrame(
    {
        "name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
        "age": [17, 19, 18, 17, 19, 15, 28, 34, 88],
        "mastered_haki": [False, False, False, False, False, False, False, False, False]
    }
)
df_now = pd.DataFrame(
    {
        "name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
        "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
        "mastered_haki": [True, True, False, True, True, False, False, False, False]
    }
)
panel_data = pd.Panel({
    '2 years ago': df_2_yr_ago,
    'now': df_now
})

In [47]:
panel_data

<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 9 (major_axis) x 3 (minor_axis)
Items axis: 2 years ago to now
Major_axis axis: 0 to 8
Minor_axis axis: age to name

In [53]:
panel_data['now']

Unnamed: 0,age,mastered_haki,name
0,19,True,Monkey D. Luffy
1,21,True,Roronoa Zoro
2,20,False,Nami
3,19,True,Usopp
4,21,True,Vinsmoke Sanji
5,17,False,Tony Tony Chopper
6,30,False,Nico Robin
7,36,False,Franky
8,90,False,Brook


In [49]:
panel_data['2 years ago']

Unnamed: 0,age,mastered_haki,name
0,17,False,Monkey D. Luffy
1,19,False,Roronoa Zoro
2,18,False,Nami
3,17,False,Usopp
4,19,False,Vinsmoke Sanji
5,15,False,Tony Tony Chopper
6,28,False,Nico Robin
7,34,False,Franky
8,88,False,Brook


## 選擇 Data frame 中的元素

- 可以透過中括號 `[]` 選擇元素
- 也可以透過 `.` 將變數當作屬性選擇

In [54]:
df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }
)

print df['name']
print "\n"
print df.name

0      Monkey D. Luffy
1         Roronoa Zoro
2                 Nami
3                Usopp
4       Vinsmoke Sanji
5    Tony Tony Chopper
6           Nico Robin
7               Franky
8                Brook
Name: name, dtype: object


0      Monkey D. Luffy
1         Roronoa Zoro
2                 Nami
3                Usopp
4       Vinsmoke Sanji
5    Tony Tony Chopper
6           Nico Robin
7               Franky
8                Brook
Name: name, dtype: object


## 選擇 Data frame 中的元素（2）

- 可以選擇多個變數

In [55]:
df[['name', 'is_male']]

Unnamed: 0,name,is_male
0,Monkey D. Luffy,True
1,Roronoa Zoro,True
2,Nami,False
3,Usopp,True
4,Vinsmoke Sanji,True
5,Tony Tony Chopper,True
6,Nico Robin,False
7,Franky,True
8,Brook,True


## 選擇 Data frame 中的元素（3）

- `[:]` 範圍切割（range slicing）支援列資料

In [58]:
df[:5]

Unnamed: 0,age,is_male,name
0,19,True,Monkey D. Luffy
1,21,True,Roronoa Zoro
2,20,False,Nami
3,19,True,Usopp
4,21,True,Vinsmoke Sanji


In [59]:
df[0:7:2]

Unnamed: 0,age,is_male,name
0,19,True,Monkey D. Luffy
2,20,False,Nami
4,21,True,Vinsmoke Sanji
6,30,False,Nico Robin


## 選擇 Data frame 中的元素（4）

- 不同的選擇方法：
    - `.loc`（以列欄的標籤為準）
    - `.iloc`（以列欄的索引值為準）

In [60]:
df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    },
    index = list(range(5)) + list(range(10, 14))
)

df

Unnamed: 0,age,is_male,name
0,19,True,Monkey D. Luffy
1,21,True,Roronoa Zoro
2,20,False,Nami
3,19,True,Usopp
4,21,True,Vinsmoke Sanji
10,17,True,Tony Tony Chopper
11,30,False,Nico Robin
12,36,True,Franky
13,90,True,Brook


In [65]:
df.loc[:7, ['name', 'age']]

Unnamed: 0,name,age
0,Monkey D. Luffy,19
1,Roronoa Zoro,21
2,Nami,20
3,Usopp,19
4,Vinsmoke Sanji,21


In [70]:
df.iloc[:7, [2, 0]]

Unnamed: 0,name,age
0,Monkey D. Luffy,19
1,Roronoa Zoro,21
2,Nami,20
3,Usopp,19
4,Vinsmoke Sanji,21
10,Tony Tony Chopper,17
11,Nico Robin,30


## 選擇 Data frame 中的元素（5）

- 可以使用布林值篩選

In [71]:
age_filter = df.age < 30
df[age_filter]

Unnamed: 0,age,is_male,name
0,19,True,Monkey D. Luffy
1,21,True,Roronoa Zoro
2,20,False,Nami
3,19,True,Usopp
4,21,True,Vinsmoke Sanji
10,17,True,Tony Tony Chopper


## 選擇 Data frame 中的元素（6）

- 請同學練習使用布林值篩選出草帽海賊團的熟男：
    - `age` >= 30
    - `is_male` == True

```python
df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }
)
```

## 了解 DataFrame 的概觀

- `.shape`
- `.index`
- `.columns`
- `.info()`
- `.count()`

## 排序 DataFrame

- `.sort_index()`
- `.sort_values()`

In [79]:
df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }
)
# 排序 row 的索引值
df.sort_index(axis = 0)

Unnamed: 0,age,is_male,name
0,19,True,Monkey D. Luffy
1,21,True,Roronoa Zoro
2,20,False,Nami
3,19,True,Usopp
4,21,True,Vinsmoke Sanji
5,17,True,Tony Tony Chopper
6,30,False,Nico Robin
7,36,True,Franky
8,90,True,Brook


In [80]:
# 遞減排序 row 的索引值
df.sort_index(axis = 0, ascending = False)

Unnamed: 0,age,is_male,name
8,90,True,Brook
7,36,True,Franky
6,30,False,Nico Robin
5,17,True,Tony Tony Chopper
4,21,True,Vinsmoke Sanji
3,19,True,Usopp
2,20,False,Nami
1,21,True,Roronoa Zoro
0,19,True,Monkey D. Luffy


In [81]:
# 排序 column 的索引值
df.sort_index(axis = 1)

Unnamed: 0,age,is_male,name
0,19,True,Monkey D. Luffy
1,21,True,Roronoa Zoro
2,20,False,Nami
3,19,True,Usopp
4,21,True,Vinsmoke Sanji
5,17,True,Tony Tony Chopper
6,30,False,Nico Robin
7,36,True,Franky
8,90,True,Brook


In [82]:
# 遞減排序 column 的索引值
df.sort_index(axis = 1, ascending = False)

Unnamed: 0,name,is_male,age
0,Monkey D. Luffy,True,19
1,Roronoa Zoro,True,21
2,Nami,False,20
3,Usopp,True,19
4,Vinsmoke Sanji,True,21
5,Tony Tony Chopper,True,17
6,Nico Robin,False,30
7,Franky,True,36
8,Brook,True,90


In [83]:
# 依年齡排序
df.sort_values(by = 'age')

Unnamed: 0,age,is_male,name
5,17,True,Tony Tony Chopper
0,19,True,Monkey D. Luffy
3,19,True,Usopp
2,20,False,Nami
1,21,True,Roronoa Zoro
4,21,True,Vinsmoke Sanji
6,30,False,Nico Robin
7,36,True,Franky
8,90,True,Brook


In [84]:
# 依年齡遞減排序
df.sort_values(by = 'age', ascending = False)

Unnamed: 0,age,is_male,name
8,90,True,Brook
7,36,True,Franky
6,30,False,Nico Robin
1,21,True,Roronoa Zoro
4,21,True,Vinsmoke Sanji
2,20,False,Nami
0,19,True,Monkey D. Luffy
3,19,True,Usopp
5,17,True,Tony Tony Chopper


In [85]:
# 依性別再依年齡排序
df.sort_values(by = ['is_male', 'age'])

Unnamed: 0,age,is_male,name
2,20,False,Nami
6,30,False,Nico Robin
5,17,True,Tony Tony Chopper
0,19,True,Monkey D. Luffy
3,19,True,Usopp
1,21,True,Roronoa Zoro
4,21,True,Vinsmoke Sanji
7,36,True,Franky
8,90,True,Brook


## 處理遺漏值

- `.dropna()`
- `.fillna()`

In [86]:
name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook", np.NaN]
age = [19, 21, 20, 19, 21, 17, 30, 36, np.NaN, np.NaN]
is_male = [True, True, False, True, True, np.NaN, False, True, True, np.NaN]

straw_hat_dict = {"name": name,
                  "age": age,
                  "is_male": is_male
}

df = pd.DataFrame(straw_hat_dict, columns = ["name", "age", "is_male"])
df

Unnamed: 0,name,age,is_male
0,Monkey D. Luffy,19.0,True
1,Roronoa Zoro,21.0,True
2,Nami,20.0,False
3,Usopp,19.0,True
4,Vinsmoke Sanji,21.0,True
5,Tony Tony Chopper,17.0,
6,Nico Robin,30.0,False
7,Franky,36.0,True
8,Brook,,True
9,,,


In [87]:
# 所有變數都遺漏的觀測值才刪除
df.dropna(how = 'all')

Unnamed: 0,name,age,is_male
0,Monkey D. Luffy,19.0,True
1,Roronoa Zoro,21.0,True
2,Nami,20.0,False
3,Usopp,19.0,True
4,Vinsmoke Sanji,21.0,True
5,Tony Tony Chopper,17.0,
6,Nico Robin,30.0,False
7,Franky,36.0,True
8,Brook,,True


In [88]:
# 任一變數遺漏的觀測值就刪除
df.dropna(how = 'any')

Unnamed: 0,name,age,is_male
0,Monkey D. Luffy,19.0,True
1,Roronoa Zoro,21.0,True
2,Nami,20.0,False
3,Usopp,19.0,True
4,Vinsmoke Sanji,21.0,True
6,Nico Robin,30.0,False
7,Franky,36.0,True


In [90]:
df = df.dropna(how = "all")
df.is_male = df.is_male.fillna(True)
df.age = df.age.fillna(90)
df

Unnamed: 0,name,age,is_male
0,Monkey D. Luffy,19.0,True
1,Roronoa Zoro,21.0,True
2,Nami,20.0,False
3,Usopp,19.0,True
4,Vinsmoke Sanji,21.0,True
5,Tony Tony Chopper,17.0,True
6,Nico Robin,30.0,False
7,Franky,36.0,True
8,Brook,90.0,True


## 合併

- `pandas.concat()`
- 垂直合併（`axis = 0`）

In [91]:
df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }, columns = ['name', 'age', 'is_male']
)

upper_df = df.loc[:4, :]
lower_df = df.loc[5:, :]
pd.concat([upper_df, lower_df], axis = 0)

Unnamed: 0,name,age,is_male
0,Monkey D. Luffy,19,True
1,Roronoa Zoro,21,True
2,Nami,20,False
3,Usopp,19,True
4,Vinsmoke Sanji,21,True
5,Tony Tony Chopper,17,True
6,Nico Robin,30,False
7,Franky,36,True
8,Brook,90,True


## 合併（2）

- `pandas.concat()`
- 水平合併（`axis = 1`）

In [92]:
df = pd.DataFrame(
    {"name": ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"],
     "age": [19, 21, 20, 19, 21, 17, 30, 36, 90],
     "is_male": [True, True, False, True, True, True, False, True, True]
    }, columns = ['name', 'age', 'is_male']
)

left_df = df.loc[:, ["name", "age"]]
right_df = df.loc[:, "is_male"]
pd.concat([left_df, right_df], axis = 1)

Unnamed: 0,name,age,is_male
0,Monkey D. Luffy,19,True
1,Roronoa Zoro,21,True
2,Nami,20,False
3,Usopp,19,True
4,Vinsmoke Sanji,21,True
5,Tony Tony Chopper,17,True
6,Nico Robin,30,False
7,Franky,36,True
8,Brook,90,True


## 合併（3）

- `pd.merge()`
- Inner Join（預設）

In [93]:
import pandas as pd

name = ["Monkey D. Luffy", "Roronoa Zoro", "Nami", "Usopp", "Vinsmoke Sanji", "Tony Tony Chopper", "Nico Robin", "Franky", "Brook"]
age = [19, 21, 20, 19, 21, 17, 30, 36, 90]
straw_hat_dict = {
    "name": name,
    "age": age
}

name = ["Monkey D. Luffy", "Tony Tony Chopper", "Nico Robin", "Brook", "Trafalgar D. Water Law"]
devil_fruit = ["Gum-Gum Fruit", "Human-Human Fruit", "Hana-Hana Fruit", "Revive-Revive Fruit", "Op-Op Fruit"]
devil_fruit_dict = {
    "name": name,
    "devil_fruit": devil_fruit
}

left_df = pd.DataFrame(straw_hat_dict)
right_df = pd.DataFrame(devil_fruit_dict)
inner_joined = pd.merge(left_df, right_df)
inner_joined

Unnamed: 0,age,name,devil_fruit
0,19,Monkey D. Luffy,Gum-Gum Fruit
1,17,Tony Tony Chopper,Human-Human Fruit
2,30,Nico Robin,Hana-Hana Fruit
3,90,Brook,Revive-Revive Fruit


## 合併（4）

- `pd.merge()`
- Left Join（`how = "left"`）

In [94]:
left_joined = pd.merge(left_df, right_df, how = "left")
left_joined

Unnamed: 0,age,name,devil_fruit
0,19,Monkey D. Luffy,Gum-Gum Fruit
1,21,Roronoa Zoro,
2,20,Nami,
3,19,Usopp,
4,21,Vinsmoke Sanji,
5,17,Tony Tony Chopper,Human-Human Fruit
6,30,Nico Robin,Hana-Hana Fruit
7,36,Franky,
8,90,Brook,Revive-Revive Fruit


## 合併（5）

- `pd.merge()`
- Right Join（`how = "right"`）

In [95]:
right_joined = pd.merge(left_df, right_df, how = "right")
right_joined

Unnamed: 0,age,name,devil_fruit
0,19.0,Monkey D. Luffy,Gum-Gum Fruit
1,17.0,Tony Tony Chopper,Human-Human Fruit
2,30.0,Nico Robin,Hana-Hana Fruit
3,90.0,Brook,Revive-Revive Fruit
4,,Trafalgar D. Water Law,Op-Op Fruit


## 合併（6）

- `pd.merge()`
- Full Join（`how = "outer"`）

In [96]:
full_joined = pd.merge(left_df, right_df, how = "outer")
full_joined

Unnamed: 0,age,name,devil_fruit
0,19.0,Monkey D. Luffy,Gum-Gum Fruit
1,21.0,Roronoa Zoro,
2,20.0,Nami,
3,19.0,Usopp,
4,21.0,Vinsmoke Sanji,
5,17.0,Tony Tony Chopper,Human-Human Fruit
6,30.0,Nico Robin,Hana-Hana Fruit
7,36.0,Franky,
8,90.0,Brook,Revive-Revive Fruit
9,,Trafalgar D. Water Law,Op-Op Fruit
