# Pandas 的 SettingWithCopyWarning 报警

## 0.读取数据

In [1]:
import pandas as pd

In [2]:
fpath='./datas/beijing_tianqi/beijing_tianqi_2018.csv'
df = pd.read_csv(fpath)

In [3]:
df.head()

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel
0,2018-01-01,3℃,-6℃,晴~多云,东北风,1-2级,59,良,2
1,2018-01-02,2℃,-5℃,阴~多云,东北风,1-2级,49,优,1
2,2018-01-03,2℃,-5℃,多云,北风,1-2级,28,优,1
3,2018-01-04,0℃,-8℃,阴,东北风,1-2级,28,优,1
4,2018-01-05,3℃,-6℃,多云~晴,西北风,1-2级,50,优,1


In [4]:
df['bWendu'] = df['bWendu'].str.replace("℃","").astype('int32')
df['yWendu'] = df['yWendu'].str.replace("℃","").astype('int32')

In [5]:
df.head()

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel
0,2018-01-01,3,-6,晴~多云,东北风,1-2级,59,良,2
1,2018-01-02,2,-5,阴~多云,东北风,1-2级,49,优,1
2,2018-01-03,2,-5,多云,北风,1-2级,28,优,1
3,2018-01-04,0,-8,阴,东北风,1-2级,28,优,1
4,2018-01-05,3,-6,多云~晴,西北风,1-2级,50,优,1


## 1.复现

In [11]:
#只选出3月份的数据用于分析
condition = df['ymd'].str.startswith('2018-03')

In [13]:
#设置温差
df[condition]['dif_tmp'] = df['bWendu']-df['yWendu']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[condition]['dif_tmp'] = df['bWendu']-df['yWendu']


In [14]:
df[condition].head()

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel
59,2018-03-01,8,-3,多云,西南风,1-2级,46,优,1
60,2018-03-02,9,-1,晴~多云,北风,1-2级,95,良,2
61,2018-03-03,13,3,多云~阴,北风,1-2级,214,重度污染,5
62,2018-03-04,7,-2,阴~多云,东南风,1-2级,144,轻度污染,3
63,2018-03-05,8,-3,晴,南风,1-2级,94,良,2


## 2.原因  
发出警告的代码 df[condition]['dif_tmp'] = df['bWendu']-df['yWendu']  
相当于df.get(condition).set(dif_tmp), 第一步骤的get发出了报警  
**链式操作其实是两个步骤，先get后set，get得到的dataframe可能是view也可能是copy，pandas发出警告**

官方文档https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy  

核心要诀：pandas的dataframe的修改写操作，只允许在源dataframe上进行，一步到位

## 3.解决方法1  

将get+set的两步操作，改成set的一步操作

In [15]:
df.loc[condition, 'dif_tmp'] = df['bWendu']-df['yWendu']

In [18]:
df[condition]

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,dif_tmp
59,2018-03-01,8,-3,多云,西南风,1-2级,46,优,1,11.0
60,2018-03-02,9,-1,晴~多云,北风,1-2级,95,良,2,10.0
61,2018-03-03,13,3,多云~阴,北风,1-2级,214,重度污染,5,10.0
62,2018-03-04,7,-2,阴~多云,东南风,1-2级,144,轻度污染,3,9.0
63,2018-03-05,8,-3,晴,南风,1-2级,94,良,2,11.0
64,2018-03-06,6,-3,多云~阴,东南风,3-4级,67,良,2,9.0
65,2018-03-07,6,-2,阴~多云,北风,1-2级,65,良,2,8.0
66,2018-03-08,8,-4,晴,东北风,1-2级,62,良,2,12.0
67,2018-03-09,10,-2,多云,西南风,1-2级,132,轻度污染,3,12.0
68,2018-03-10,14,-2,晴,东南风,1-2级,171,中度污染,4,16.0


## 4. 解决方法2

如果需要预筛选数据做后续的处理分析，使用copy复值dataframe

In [22]:
df_month3= df[condition].copy()

In [25]:
df_month3.head()

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,dif_tmp
59,2018-03-01,8,-3,多云,西南风,1-2级,46,优,1,
60,2018-03-02,9,-1,晴~多云,北风,1-2级,95,良,2,
61,2018-03-03,13,3,多云~阴,北风,1-2级,214,重度污染,5,
62,2018-03-04,7,-2,阴~多云,东南风,1-2级,144,轻度污染,3,
63,2018-03-05,8,-3,晴,南风,1-2级,94,良,2,


In [26]:
df_month3['dif_tmp'] = df_month3['bWendu']-df_month3['yWendu']

In [27]:
df_month3

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,dif_tmp
59,2018-03-01,8,-3,多云,西南风,1-2级,46,优,1,11
60,2018-03-02,9,-1,晴~多云,北风,1-2级,95,良,2,10
61,2018-03-03,13,3,多云~阴,北风,1-2级,214,重度污染,5,10
62,2018-03-04,7,-2,阴~多云,东南风,1-2级,144,轻度污染,3,9
63,2018-03-05,8,-3,晴,南风,1-2级,94,良,2,11
64,2018-03-06,6,-3,多云~阴,东南风,3-4级,67,良,2,9
65,2018-03-07,6,-2,阴~多云,北风,1-2级,65,良,2,8
66,2018-03-08,8,-4,晴,东北风,1-2级,62,良,2,12
67,2018-03-09,10,-2,多云,西南风,1-2级,132,轻度污染,3,12
68,2018-03-10,14,-2,晴,东南风,1-2级,171,中度污染,4,16


**总之，pandas不允许先筛选出子dataframe，在进行修改写入**  
要么使用.loc实现一个步骤直接修改源dataframe  
要么先复制一个子dataframe再一个步骤执行修改