# 3. Handling the SettingWithCopyWarning

## Assigning subsets of Data - The dreaded `SettingWithCopyWarning`
Assigning a subset of data refers to changing the values of a DataFrame. Doing so often results in a `SettingWithCopyWarning`. This warning means one of three things:

1. You have made the assignment you wanted to make with unintended side effects
1. You have not made the assignment you wanted
1. You have made the assignment you wanted to make with no side effects

Let's see examples of each:

## 1. Correct Assignment with Side Effects
Let's begin with our sample DataFrame from above. Let's say we are interested in selecting the age column as a Series. Then let's change the value of age for Aaron to 13. Assume, we do not want to change the values of the original DataFrame.

Let's re-read in the data and then select the age column and output it.

In [None]:
import pandas as pd
df = pd.read_csv('data/sample_data.csv', index_col=0)
df

In [None]:
age = df['age']
age

Change the value of Aaron to 13, which should trigger the warning.

In [None]:
age.loc['Aaron'] = 13

Let's verify that the age Series has correctly updated its value for Aaron.

In [None]:
age

The side effect here is that the original DataFrame was also changed. Let's take a look at it.

In [None]:
df

### Explanation
Whenever we selected the age column with `df['age']`, Pandas did not make a copy of the underlying data. Both the `age` Series and the `df` DataFrame are referencing the same underlying data (a NumPy array). Thus, when we made the assignment with `age.loc['Aaron'] = 13`, this singular reference was updated. Since both `age` and `df` reference this array in memory, they both report the new age for Aaron.

### Fix side effect
Pandas never makes a new copy of data in memory when selecting a single column as a Series. If you want to only modify the Series and not the original DataFrame, use the `copy` method to force a copy of the data. Let's see this below.

In [None]:
df = pd.read_csv('data/sample_data.csv', index_col=0)
age = df['age'].copy()
age.loc['Aaron'] = 13
age

The `age` Series has been modified. Let's verify that `df` has not.

In [None]:
df

### The `SettingWithCopyWarning` is buggy
Just because the warning is not triggered, does not mean you have not modified the original DataFrame. The `SettingWithCopyWarning` does not work 100% of the time.

**GUIDANCE** - If you select a subset of data into its own variable and want to only modify this subset, then use the `copy` method to ensure that the subset is not referencing the same data as the parent DataFrame.

## 2. No assignment takes place
Let's say you want to set the score value to 10 for all the people in the original dataset older than 30. You do not want to make a new DataFrame or Series here. Using **chained indexing** will not result in the assignment. It will look as though nothing happened.

We re-read in the data, create a filter and attempt to assign the new values to the score column.

In [None]:
df = pd.read_csv('data/sample_data.csv', index_col=0)
filt = df['age'] > 30
df[filt]['score'] = 10

### What is chained indexing?
Chained indexing is when there two consecutive indexing operations (subset selections). In the above subset assignment, we have the first subset selection `[filt]` followed by the second `['score']`. This is chained indexing. One subset selection follows another.

We were attempting to make an assignment, but no assignment took place. Let's inspect the DataFrame to verify this.

In [None]:
df

### Use a single indexer, `loc`, to make the assignment
You should never use chained indexing to make an assignment. Instead, you should use a single indexer, `loc`, to make the assignment. Let's attempt to make our selection once again.

In [None]:
df.loc[filt, 'score'] = 10

Let's verify that the DataFrame changed.

In [None]:
df

### More details as to why chained indexing does not change the original DataFrame
Chained indexing does not work because an intermediate DataFrame is created with new, copied data. It is this intermediate DataFrame that has its data changed. But, this intermediate DataFrame is never assigned to a variable, therefore there is no reference to it.

Let's break up the chained indexing into two steps and assign the intermediate result to a variable so that you can see that data does get changed.

In [None]:
df = pd.read_csv('data/sample_data.csv', index_col=0)
filt = df['age'] > 30
df_intermediate = df[filt]
df_intermediate['score'] = 10
df_intermediate

Now, we can see that this intermediate DataFrame has its data changed but not the original as verified below.

In [None]:
df

**GUIDANCE** - Do not use chained indexing to make assignments. Use a single call to the `loc` indexer.

## 3. Correct Assignment, No Side Effect
The last case is where the assignment happens correctly and there is no side effect, but again the `SettingWithCopyWarning` shows up. Let's say we want to select both the state and age columns as a new DataFrame and work with this new DataFrame for further analysis. We then want to change the age column of this new DataFrame without changing the original.

In [None]:
df = pd.read_csv('data/sample_data.csv', index_col=0)
df_new = df[['state', 'age']]
df_new

Now that we have our new DataFrame let's set all of the ages to 99. This will trigger the warning.

In [None]:
df_new['age'] = 99

Let's verify that this new DataFrame has its values changed.

In [None]:
df_new

Looking at the original, we see that it remains the same.

In [None]:
df

### Explanation
In this instance, Pandas makes a completely new copy of the data when creating the new DataFrame when executing `df_new = df[['state', 'age']]`. The values of this DataFrame can be set without worrying about changing the original.

### Use the `copy` method to prevent the warning from being triggered
You prevent the warning from being triggered if you explicitly tell pandas to make a copy of the data. Let's see how this is done. Notice how there is no warning triggered.

In [None]:
df = pd.read_csv('data/sample_data.csv', index_col=0)
df_new = df[['state', 'age']].copy()
df_new['age'] = 99
df_new

Verify that the original is unchanged.

In [None]:
df

## The `SettingWithCopyWarning` appears both when there is and is not a copy of data, why?
What's bizarre about the `SettingWithCopyWarning` is that it gets triggered both when there is a copy of the data created and also when there is not a copy of the data created. In example 1 from above, no new copy of data was created when selecting a single column. In example 3, there was a new copy of data created.  Both triggered the `SettingWithCopyWarning`.

## No need to memorize these rules -  Two cases
The `SettingWithCopyWarning` is inconsistent, poorly named, and doesn't give correct guidance for each case. Instead of worrying about what is happening, you can use the following simple plan on how to proceed so that you can avoid triggering the warning. You will nearly always find yourself in one of two cases.

1. You want to keep working with the original DataFrame and modify its values
1. You want to create a new DataFrame, modify the values for the new DataFrame and keep the values of the original unchanged


### 1. Keep working with original DataFrame
When you want to keep working with the original DataFrame, but change its values, you must use a single indexer to do so and not chained indexing. If you are subsetting both rows and columns, you will need to use `loc`. For instance, if you wanted to change the score to 99 for all those with age greater than 30, you would do the following.

In [None]:
df = pd.read_csv('data/sample_data.csv', index_col=0)
filt = df['age'] > 30
df.loc[filt, 'score'] = 99
df

### 2. Create a new DataFrame and modify its values
On the other hand, you might want to work with only a subset of the original DataFrame and change the values of this new DataFrame. In this case, you need to use the `copy` method to force Pandas to make a new copy for your new DataFrame. This is exactly what we did in example 3 from above.

In [None]:
df = pd.read_csv('data/sample_data.csv', index_col=0)
df_new = df[['state', 'age']].copy()
df_new['age'] = 99
df_new

### What the warning should really say

The `SettingWithCopyWarning` should probably read something to the effect of "You may be making an assignment on data that is or is not a copy. Pandas doesn't know for sure. Either use a single indexer (usually `loc`) to make the assignment or force a copy with the `copy` method."