---
# Add/Remove Rows and Columns
Adding and removing rows and columns from DataFrames

---

In [168]:
import pandas as pd
import numpy as np
from IPython.display import display

In [169]:
# Function for printing a horizontal line. For display purporse
def printhr(s: str = None, n: int = 40):
    """Print a horizontal rule of the character "=" of length n.

    Args:
        s (str, optional): Header message. Defaults to None.
        n (int, optional): Number of characters. Defaults to 50.
    """

    if s:
        print("=" * int(n / 2), s, "=" * int(n / 2))
    else:
        print("=" * n)

In [170]:
people = {
    "first": ["Lorem", "John", "Jane"],
    "last": ["Ipsum", "Doe", "Doe"],
    "email": ["lorem@yahoo.com", "john@gmail.com", "jane@outlook.com"],
}
df = pd.DataFrame(people)
display(df)

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


---
## Adding New Columns
You can add a new column to an existing DataFrame by indexing to it (the name it will have) and assigning data to it (in the following example, the data is a Series).

---

In [171]:
# Add a new column that combines the first and last name

display(df)
full_name_series = df["first"] + " " + df["last"]
display(full_name_series)

# Create a new column named "full_name"
df["full_name"] = full_name_series
display(df)

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


0    Lorem Ipsum
1       John Doe
2       Jane Doe
dtype: object

Unnamed: 0,first,last,email,full_name
0,Lorem,Ipsum,lorem@yahoo.com,Lorem Ipsum
1,John,Doe,john@gmail.com,John Doe
2,Jane,Doe,jane@outlook.com,Jane Doe


---
## Removing Columns
Removing a column can be done by the `drop()` method.  
`drop()` can remove a Series (row or column, determined by the **axis** parameter) by passing in the label(s) of the Series to the **labels** parameter.  

**axis** can be set to 0, "rows", or "index" to remove a row Series, and set to 1, or "columns" to remove a column Series. **axis** defaults to 0 (rows Series).  

Columns can also be dropped directly by passing the column label(s) in an iterable to the **columns** parameter. This bypasses the need of specifying the **axis** parameter.  

**Note:** this method does not modify the original df. Modify the original by setting the inplace parameter to True.  

---

In [172]:
# Removing last name using labels and axis parameters.
# Note that we are not applying this change (inplace=False)

display(df)
a = df.drop(labels="last", axis=1)
display(a)


Unnamed: 0,first,last,email,full_name
0,Lorem,Ipsum,lorem@yahoo.com,Lorem Ipsum
1,John,Doe,john@gmail.com,John Doe
2,Jane,Doe,jane@outlook.com,Jane Doe


Unnamed: 0,first,email,full_name
0,Lorem,lorem@yahoo.com,Lorem Ipsum
1,John,john@gmail.com,John Doe
2,Jane,jane@outlook.com,Jane Doe


In [173]:
# Removing index 0 and 2 (Lorem and John) using labels
# Note that we are not applying this change (inplace=False)

display(df)
a = df.drop(labels=[0, 1])
display(a)

Unnamed: 0,first,last,email,full_name
0,Lorem,Ipsum,lorem@yahoo.com,Lorem Ipsum
1,John,Doe,john@gmail.com,John Doe
2,Jane,Doe,jane@outlook.com,Jane Doe


Unnamed: 0,first,last,email,full_name
2,Jane,Doe,jane@outlook.com,Jane Doe


In [174]:
# Removing first and last names using columns parameter
# We are applying this change (inplace=True)

display(df)
df.drop(columns=["first", "last"], inplace=True)
display(df)

Unnamed: 0,first,last,email,full_name
0,Lorem,Ipsum,lorem@yahoo.com,Lorem Ipsum
1,John,Doe,john@gmail.com,John Doe
2,Jane,Doe,jane@outlook.com,Jane Doe


Unnamed: 0,email,full_name
0,lorem@yahoo.com,Lorem Ipsum
1,john@gmail.com,John Doe
2,jane@outlook.com,Jane Doe


---
**Adding the first and last name columns back**  

We are going to return the first and last name columns back to the original df by using the `split()` method.  

If the **expand** parameter is set to True, `split()` assigns the split values in to their own Series in a DataFrame. If **expand** is False, the split values are stored in a list inside a Series.

---

In [175]:
## Create a DataFrame of 2 columns with the full_name column

# With expand=False (default)

display(df)
split_df = df["full_name"].str.split(" ")
display(split_df)

# We do not want this behavior. We want a DataFrame.

Unnamed: 0,email,full_name
0,lorem@yahoo.com,Lorem Ipsum
1,john@gmail.com,John Doe
2,jane@outlook.com,Jane Doe


0    [Lorem, Ipsum]
1       [John, Doe]
2       [Jane, Doe]
Name: full_name, dtype: object

In [176]:
display(df)

# With expand=True
split_df = df["full_name"].str.split(" ", expand=True)
display(split_df)

# Assign the resulting df to the original df
df[["first", "last"]] = split_df
display(df)

Unnamed: 0,email,full_name
0,lorem@yahoo.com,Lorem Ipsum
1,john@gmail.com,John Doe
2,jane@outlook.com,Jane Doe


Unnamed: 0,0,1
0,Lorem,Ipsum
1,John,Doe
2,Jane,Doe


Unnamed: 0,email,full_name,first,last
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum
1,john@gmail.com,John Doe,John,Doe
2,jane@outlook.com,Jane Doe,Jane,Doe


---
## Adding Rows
To add rows to Series and DataFrames, we can use the `concat()` function. `concat()` #TODO

**Note:** this method does not modify the original df. Modify the original by setting the inplace parameter to True.  

---

In [177]:
# Create row to be added

display(df)
new_row = pd.DataFrame(
    {
        "first": "Ken",
        "last": "Adams",
        "email": "adams@gmail.com",
    }, index=[0]
)
display(new_row)

df = pd.concat([df, new_row], ignore_index=True)
display(df)

Unnamed: 0,email,full_name,first,last
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum
1,john@gmail.com,John Doe,John,Doe
2,jane@outlook.com,Jane Doe,Jane,Doe


Unnamed: 0,first,last,email
0,Ken,Adams,adams@gmail.com


Unnamed: 0,email,full_name,first,last
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum
1,john@gmail.com,John Doe,John,Doe
2,jane@outlook.com,Jane Doe,Jane,Doe
3,adams@gmail.com,,Ken,Adams


In [178]:
# Working example (why not make an append() substitute? This is convoluted)

# Original df
df2 = pd.DataFrame({"a": [1], "b": [2]})
display(df2)

printhr()
# df to be added as row
new_row = pd.Series({"a": 3, "b": 4})

display(new_row)
# Convert Series to df and transpose (swap rows with columns)
new_row = new_row.to_frame().T
display(new_row)

printhr()
# Concat new row to existing df
df2 = pd.concat([df2, new_row], ignore_index=True)
display(df2)

Unnamed: 0,a,b
0,1,2




a    3
b    4
dtype: int64

Unnamed: 0,a,b
0,3,4




Unnamed: 0,a,b
0,1,2
1,3,4


---
## Removing Rows
Removing a row can be done the same as removing columns using `drop()`.

**Note:** this method does not modify the original df. Modify the original by setting the inplace parameter to True.  

---

In [179]:
# Say we want to remove Ken Adams
display(df)

# Remove row
df.drop(3, inplace=True)
display(df)

Unnamed: 0,email,full_name,first,last
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum
1,john@gmail.com,John Doe,John,Doe
2,jane@outlook.com,Jane Doe,Jane,Doe
3,adams@gmail.com,,Ken,Adams


Unnamed: 0,email,full_name,first,last
0,lorem@yahoo.com,Lorem Ipsum,Lorem,Ipsum
1,john@gmail.com,John Doe,John,Doe
2,jane@outlook.com,Jane Doe,Jane,Doe
