---
# Add/Remove Rows and Columns
Adding and removing rows and columns from DataFrames

---

In [139]:
import pandas as pd
import numpy as np
from IPython.display import display

In [140]:
people = {
    "first": ["Lorem", "John", "Jane"],
    "last": ["Ipsum", "Doe", "Doe"],
    "email": ["lorem@yahoo.com", "john@gmail.com", "jane@outlook.com"],
}
df = pd.DataFrame(people)
display(df)

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


---
## Adding New Columns
You can add a new column to an existing DataFrame by indexing to it (the name it will have) and assigning a Series to it.

---

In [141]:
# Add a new column that combines the first and last name

display(df)
full_name_series = df["first"] + " " + df["last"]
display(full_name_series)

# Create a new column named "full_name"
df["full_name"] = full_name_series
display(df)

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


0    Lorem Ipsum
1       John Doe
2       Jane Doe
dtype: object

Unnamed: 0,first,last,email,full_name
0,Lorem,Ipsum,lorem@yahoo.com,Lorem Ipsum
1,John,Doe,john@gmail.com,John Doe
2,Jane,Doe,jane@outlook.com,Jane Doe


---
## Removing Columns
Removing a column can be done by the `drop()` method.  
`drop()` can remove a Series (row or column, determined by the **axis** parameter) by passing in the label(s) of the Series to the **labels** parameter.  

**axis** can be set to 0, "rows", or "index" to remove a row Series, and set to 1, or "columns" to remove a column Series. **axis** defaults to 0 (rows Series).  

Columns can also be dropped directly by passing the column label(s) in an iterable to the **columns** parameter. This bypasses the need of specifying the **axis** parameter.  

**Note:** this method does not modify the original df. Modify the original by setting the inplace parameter to True.  

---

In [142]:
# Removing last name using labels and axis parameters.
# Note that we are not applying this change (inplace=False)

display(df)
a = df.drop(labels="last", axis=1)
display(a)


Unnamed: 0,first,last,email,full_name
0,Lorem,Ipsum,lorem@yahoo.com,Lorem Ipsum
1,John,Doe,john@gmail.com,John Doe
2,Jane,Doe,jane@outlook.com,Jane Doe


Unnamed: 0,first,email,full_name
0,Lorem,lorem@yahoo.com,Lorem Ipsum
1,John,john@gmail.com,John Doe
2,Jane,jane@outlook.com,Jane Doe


In [143]:
# Removing index 0 and 2 (Lorem and John) using labels
# Note that we are not applying this change (inplace=False)

display(df)
a = df.drop(labels=[0, 1])
display(a)

Unnamed: 0,first,last,email,full_name
0,Lorem,Ipsum,lorem@yahoo.com,Lorem Ipsum
1,John,Doe,john@gmail.com,John Doe
2,Jane,Doe,jane@outlook.com,Jane Doe


Unnamed: 0,first,last,email,full_name
2,Jane,Doe,jane@outlook.com,Jane Doe


In [144]:
# Removing first and last names using columns parameter
# We are applying this change (inplace=True)

display(df)
df.drop(columns=["first", "last"], inplace=True)
display(df)

Unnamed: 0,first,last,email,full_name
0,Lorem,Ipsum,lorem@yahoo.com,Lorem Ipsum
1,John,Doe,john@gmail.com,John Doe
2,Jane,Doe,jane@outlook.com,Jane Doe


Unnamed: 0,email,full_name
0,lorem@yahoo.com,Lorem Ipsum
1,john@gmail.com,John Doe
2,jane@outlook.com,Jane Doe


---
**Adding the first and last name columns back**  

We are going to return the first and last name columns back to the original df by using the `split()` method.  

If the **expand** parameter is set to True, `split()` assigns the split values in to their own Series in a DataFrame. If **expand** is False, the split values are stored in a list inside a Series.

---

In [145]:
## Create a DataFrame of 2 columns with the full_name column

# With expand=False (default)

display(df)
split_df = df["full_name"].str.split(" ")
display(split_df)

# We do not want this behavior. We want a DataFrame.

Unnamed: 0,email,full_name
0,lorem@yahoo.com,Lorem Ipsum
1,john@gmail.com,John Doe
2,jane@outlook.com,Jane Doe


0    [Lorem, Ipsum]
1       [John, Doe]
2       [Jane, Doe]
Name: full_name, dtype: object

In [146]:
display(df)

# With expand=True
split_df = df["full_name"].str.split(" ", expand=True)
display(split_df)

# Assign the resulting df to the original df
df[["first", "last"]] = split_df
display(df)

Unnamed: 0,email,full_name
0,lorem@yahoo.com,Lorem Ipsum
1,john@gmail.com,John Doe
2,jane@outlook.com,Jane Doe


Unnamed: 0,0,1
0,Lorem,Ipsum
1,John,Doe
2,Jane,Doe


Unnamed: 0,email,full_name,first,last
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum
1,john@gmail.com,John Doe,John,Doe
2,jane@outlook.com,Jane Doe,Jane,Doe


---
## Adding Rows


**Note:** this method does not modify the original df. Modify the original by setting the inplace parameter to True.  

---

In [147]:
display(df)
pd.concat([df, pd.Series(["Hello"])], axis=1)
x = df["email"]
display(x)
a = pd.concat([df, x], axis="columns")
display(a)

Unnamed: 0,email,full_name,first,last
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum
1,john@gmail.com,John Doe,John,Doe
2,jane@outlook.com,Jane Doe,Jane,Doe


0     lorem@yahoo.com
1      john@gmail.com
2    jane@outlook.com
Name: email, dtype: object

Unnamed: 0,email,full_name,first,last,email.1
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum,lorem@yahoo.com
1,john@gmail.com,John Doe,John,Doe,john@gmail.com
2,jane@outlook.com,Jane Doe,Jane,Doe,jane@outlook.com


In [160]:
display(df)
# a_series = pd.DataFrame({"first": ["Tony"]})
# display(a_series)
# df = pd.concat([df, a_series])
# df
# pd.DataFrame.from_records([{"first": "Tony", "last": "Stark"}, {"last": "Adams"}])
pd.DataFrame.from_records({"first": "ken"}, {"first": "adams"})

Unnamed: 0,email,full_name,first,last
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum
1,john@gmail.com,John Doe,John,Doe
2,jane@outlook.com,Jane Doe,Jane,Doe


KeyError: 0

df = pd.concat([df, pd.DataFrame.from_records([{ 'first': 'Tony'}])])

In [159]:
pd.DataFrame([("Cat", "Dog", "Bananas")])

Unnamed: 0,0,1,2
0,Cat,Dog,Bananas


In [163]:
data = [{'col_1': 3},
        {'col_1': 2, 'col_2': 'b'},
        {'col_1': 1, 'col_2': 'c'},
        {'col_1': 0, 'col_2': 'd'}]
a = pd.DataFrame.from_records(data)
b = pd.DataFrame(data)
display(a)
display(b)

Unnamed: 0,col_1,col_2
0,3,
1,2,b
2,1,c
3,0,d


Unnamed: 0,col_1,col_2
0,3,
1,2,b
2,1,c
3,0,d
