# Assigning Subsets of Data

In previous chapters, we learned how to select subsets of data and create new columns with the assignment statement. In this chapter, we will assign a subset of the data with new data. Let's begin by reading in a simple DataFrame.

In [None]:
import pandas as pd
df = pd.read_csv('../data/sample_data.csv', index_col=0)
df

## Setting new data with `loc`

The `loc` indexer simultaneously selects rows and columns from a DataFrame using labels. We covered this in great detail in previous chapters. Let's review this by selecting the age of Niko, Aaron, and Dean.

In [None]:
rows = ['Niko', 'Aaron', 'Dean']
df.loc[rows, 'age']

We can replace these three values with a list or an array of the same length or a single scalar value. Let's use the assignment statement to assign new values.

In [None]:
df.loc[rows, 'age'] = [4, 13, 34]

Let's verify that the assignment happened correctly.

In [None]:
df

You can even use one of the augmented assignment operators (`+=`, `-=`, etc...) to operate on the selection itself. Here, we increase the age of these three values by 2.

In [None]:
df.loc[rows, 'age'] += 2
df

It's also possible to modify values from a single row with `loc`. Here, we change the food and height column values for the row labeled with 'Niko'.

In [None]:
cols = ['food', 'height']
df.loc['Niko', cols] = ['PIZZA', 82]
df

## Setting new data with iloc
Setting new data with the `iloc` indexer works analogously. We begin by setting a single cell of data.

In [None]:
df.iloc[0, -1] = 99.9
df

The `iloc` indexer can take a single integer, a list of integers, or a slice. Below, we slice the rows and use a single integer for the columns.

In [None]:
df.iloc[3:, 4] = [155, 205, 195, 165]
df

## Boolean Selection Assignment
Typically, you will not be manually setting rows and columns as shown above. A more common task is to select a portion of the DataFrame with boolean selection and assign that selection new values. Let's see some examples with the employee dataset.

In [None]:
emp = pd.read_csv('../data/employee.csv')
emp.head()

Let's say we wanted to raise the minimum salary for all police department employees to 60000. Before making the assignment let's find the percentage of police department employees currently making less than this.

In [None]:
filt1 = emp['salary'] < 60000
filt2 = emp['dept'] == 'Houston Police Department-HPD'
filt = filt1 & filt2
filt.sum()

Use the `loc` indexer to select just the employees that meet the conditions for the above filter and reassign their salary.

In [None]:
emp.loc[filt, 'salary'] = 60000

Let's use our same filter to select those employees and verify that their salary has now changed.

In [None]:
emp[filt].head()

## Improper Assignment
The above assignment is often done improperly and in a way that has no effect. Let's reread in the dataset as a new variable name `emp2` and recreate our filters.

In [None]:
emp2 = pd.read_csv('../data/employee.csv')
filt1 = emp2['salary'] < 60000
filt2 = emp2['dept'] == 'Houston Police Department-HPD'
filt = filt1 & filt2
emp2[filt].head()

The last expression from above returns a DataFrame object which we can use just the brackets again to select the `salary` column.

In [None]:
emp2[filt]['salary'].head()

If we try to use an assignment statement with the above syntax, no change will take place and a `SettingWithCopyWarning` will be emitted. Let's attempt the assignment and trigger the warning. Note that this is a warning and not an error. The statement completed successfully.

In [None]:
emp2[filt]['salary'] = 60000

Selecting those employees that we were hoping to change salary exposes the improper assignment.

In [None]:
emp2[filt].head()

### What went wrong?
Executing `emp2[filt]['salary']` is referred to as **chained indexing** in the pandas documentation or with the terminology in this book **chained selections**. There were two consecutive selections. The first was boolean selection with `[filt]` followed immediately by single-column selection with `['salary']`. 

The issue is that the first selection, `emp2[filt]`, creates a completely new DataFrame with its own copy of data in memory that has nothing to do with the original DataFrrame. From this new DataFrame, we select the `salary` column and attempt to reassign each value. What we have done is set the salary for this copy of the data. pandas is nice-enough to give us a warning that we might not have accomplished what we thought we did. In this example, the waning proved to be correct and our original DataFrame was not modified. In order to properly assign a subset of data using boolean selection along a column, you need to use `loc` which is a single selection (one set of brackets) that doesn't involve making a copy of the data.

The `SettingWithCopyWarning` requires a deeper discussion to fully understand which will be presented in a later chapter.

## Exercises

Use the bikes dataset for all of the following exercises.

In [None]:
bikes = pd.read_csv('../data/bikes.csv')

### Exercise 1
<span  style="color:green; font-size:16px">Change the values of `events` to 'HEAT WAVE' for all rides where `temperature` is above 95. Verify this by outputting just the `events` and `temperature` columns that meet the condition.</span>

### Exercise 2
<span  style="color:green; font-size:16px">Increase the trip duration by 50% for all the rides that took place with a wind speed above 40. Output just the trip duration and wind speed before and after the assignment.</span>

### Exercise 3
<span  style="color:green; font-size:16px">Change the trip duration for the first two rows to 0.</span>