https://www.kdnuggets.com/2020/04/stop-hurting-pandas.html

This post will address the issues that can arise when Pandas slicing is used improperly. If you see the warning that reads "A value is trying to be set on a copy of a slice from a DataFrame", this post is for you.

Problem Statement : from the below mentioned DataFrame, We want to find all the DataFrame elements that correspond to 'x' column being larger than 3, and based on this set all the corresponding 'y' values to 50.

In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({'x':[1,5,4,3,4,5],
                   'y':[.1,.5,.4,.3,.4,.5],
                   'w':[11,15,14,13,14,15]
                  })

In [3]:
#find all the DataFrame elements that correspond to 'x' column being larger than 3

df['x']>3

0    False
1     True
2     True
3    False
4     True
5     True
Name: x, dtype: bool

In [4]:
df[df['x']>3]

Unnamed: 0,x,y,w
1,5,0.5,15
2,4,0.4,14
4,4,0.4,14
5,5,0.5,15


In [25]:
#based on this, set all the corresponding 'y' values to 50.

#  Option # 1
     # This gives warning to "A value is trying to be set on a copy of a slice from a DataFrame. We should try using .loc

In [19]:
df2 = df[df['x']>3]

In [21]:
df2

Unnamed: 0,x,y,w
1,5,0.5,15
2,4,0.4,14
4,4,0.4,14
5,5,0.5,15


In [23]:
df2['y'] = 50

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [24]:
df2

Unnamed: 0,x,y,w
1,5,50,15
2,4,50,14
4,4,50,14
5,5,50,15


In [27]:
#Option # 2 
    # Use .loc 

df.loc[ df['x']>3, 'y']

1    0.5
2    0.4
4    0.4
5    0.5
Name: y, dtype: float64

In [28]:
df.loc[ df['x']>3, 'y'] = 50

In [29]:
# Note that we didn't get a warning message

In [30]:
df

Unnamed: 0,x,y,w
0,1,0.1,11
1,5,50.0,15
2,4,50.0,14
3,3,0.3,13
4,4,50.0,14
5,5,50.0,15
