# Modifying Data in DataFrames
Updating rows, columns, and cells.

In [2]:
import pandas as pd
import numpy as np
from IPython.display import display


In [3]:
people = {
    "first": ["Lorem", "John", "Jane"],
    "last": ["Ipsum", "Doe", "Doe"],
    "email": ["lorem@yahoo.com", "john@gmail.com", "jane@outlook.com"],
}
df = pd.DataFrame(people)
df


Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


---
## Assigning Columns
We can assign or rename columns in a 2 ways:  
- Using the `.columns` attribute
- Using the `.rename()` method

---

---
### Assigning columns using the .columns attribute 
We can assign or rename columns by assigning an iterable of whose length is equal to the number of columns, to the DataFrames' columns attribute.  
**Note:** this modifies the original df in-place.

---

In [4]:
# Renaming all columns
print(f"n of columns: {df.shape[1]}")
display(df)
df.columns = ["first_name", "last_name", "email"]
df


n of columns: 3


Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


Unnamed: 0,first_name,last_name,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


In [5]:
# Title case for columns using list comprehension.
display(df)
df.columns = [c.title() for c in df.columns]
df


Unnamed: 0,first_name,last_name,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


Unnamed: 0,First_Name,Last_Name,Email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


---
### Assigning columns using the .rename() method 
We can assign or rename columns by passing a dict of any length into the `columns` parameter of the `.rename()` method. The dict should have the name of the column to be replaced as the key, and the new column name as the value, i.e.    
{"old_colname": "new_colname", "old2": "new2"}  

**Note:** this does not modify the original df. Use inplace=True as parameter to modify original df.

---

In [6]:
# Changing column names back to original
display(df)

# Note that we can change any number of columns.
replacement = {
    "First_Name": "first",
    "Last_Name": "last",
}
df.rename(columns=replacement, inplace=True)
display(df)

# Change Email column back to email for consistency
df.rename(columns={"Email": "email"}, inplace=True)
df


Unnamed: 0,First_Name,Last_Name,Email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


Unnamed: 0,first,last,Email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


---
## Updating Multiple Data in Rows
We can modify data in a row by selecting the cell and assigning a value to it. There are a few ways to do this:  
- Assigning values
- method2
- method3 

**Note:** this is for updating row data, not the "row names". Row names in pandas are the `indices`; refer to `pandas_03` for working with indices.

---

---
### Updating data by assignment
Generally, we can update values by selecting the DataFrame or Series and assigning the new values on it.

We can assign by assigning the row to an appropriate value. These values can either be:  
- `a.` Assigning an iterable whose length is equal to the number of cells to be replaced will assign the corresponding object from the iterable to the corresponding cells.  
- `b.` Assigning an iterable whose length is 1 will assign that object to all the cells.  
- `c.` Assigning an object will assign that object to all the cells.  

We can update values of a series if the number of cells to be updated equals the number of items in the iterable that will replace the cells. If only one value is assigned, that single value will be applied to all cells.

**Note:** this modifies the original df in-place.

---

In [7]:
# a. Assigning row cells using an iterable
display(df)
df.loc[1] = ["Foo", "Bar", "foobar@email.com"]
df

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,John,Doe,john@gmail.com
2,Jane,Doe,jane@outlook.com


Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,Foo,Bar,foobar@email.com
2,Jane,Doe,jane@outlook.com


In [8]:
# b. Assigning row cells using a list of length 1
display(df)
df.loc[1] = ["Cat"]
df

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,Foo,Bar,foobar@email.com
2,Jane,Doe,jane@outlook.com


Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,Cat,Cat,Cat
2,Jane,Doe,jane@outlook.com


In [9]:
# c. Assigning row cells using an object. 
# This will behave exactly like the previous example.
display(df)
df.loc[1] = "Dog"
df

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,Cat,Cat,Cat
2,Jane,Doe,jane@outlook.com


Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,Dog,Dog,Dog
2,Jane,Doe,jane@outlook.com


In [10]:
# Assigning n-length values
display(df)
df.loc[1, ["last", "email"]] = ["Doe", "dog@woof.com"]
df

Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,Dog,Dog,Dog
2,Jane,Doe,jane@outlook.com


Unnamed: 0,first,last,email
0,Lorem,Ipsum,lorem@yahoo.com
1,Dog,Doe,dog@woof.com
2,Jane,Doe,jane@outlook.com


---
### Updating single value
Functionally, both `at` and `loc` can update a single cell but `at` is more performant than `loc`. `at` can only select single cells, so both a row indexer and column indexer is needed, while `loc` can select a single or multiple cells.

**Note:** this modifies the original df in-place.

---

In [44]:
# Uncomment this to see performance comparison

# %timeit df.loc[1, "email"]
# %timeit df.at[1, "email"]

3.81 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.69 µs ± 229 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
