## Pandas的SettingWithCopyWarning报警

### 0、读取数据

In [1]:
import pandas as pd
import numpy as np

In [2]:
fpath = "./result.xlsx"
df = pd.read_excel(fpath)

In [3]:
df.head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status
0,2017-01-01,周日,-,-,多云,无持续风向微风,372,严重
1,2017-01-02,周一,-,-,霾,无持续风向微风,361,严重
2,2017-01-03,周二,-,-,霾~雾,无持续风向微风,280,重度
3,2017-01-04,周三,9°,2°,小雨,无持续风向微风,193,中度
4,2017-01-05,周四,5°,1°,小雨,无持续风向微风,216,重度


In [4]:
# 替换掉温度的后缀°
df['min_temperature'] = df['min_temperature'].map(
    lambda x: int(x.replace('°', '')) if x != '-' else np.nan)
df['max_temperature'] = df['max_temperature'].map(
    lambda x: int(x.replace('°', '')) if x != '-' else np.nan)

In [5]:
df.head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status
0,2017-01-01,周日,,,多云,无持续风向微风,372,严重
1,2017-01-02,周一,,,霾,无持续风向微风,361,严重
2,2017-01-03,周二,,,霾~雾,无持续风向微风,280,重度
3,2017-01-04,周三,9.0,2.0,小雨,无持续风向微风,193,中度
4,2017-01-05,周四,5.0,1.0,小雨,无持续风向微风,216,重度


### 1、复现

In [6]:
# 只选出3月份的数据用于分析
condition = df["date"].str.startswith("2018-03")

In [7]:
# 设置温差
df[condition]["wen_cha"] = df["max_temperature"]-df["min_temperature"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [8]:
# 查看是否修改成功
df[condition].head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status
424,2018-03-01,周四,11.0,2.0,阴~多云,东北风3-4级,146,轻度
425,2018-03-02,周五,15.0,3.0,多云,南风1-2级,102,轻度
426,2018-03-03,周六,20.0,10.0,阴~小雨,南风1-2级,128,轻度
427,2018-03-04,周日,10.0,1.0,小雨,东北风4-5级,131,轻度
428,2018-03-05,周一,11.0,1.0,多云,东北风1-2级,72,良


### 2、原因
发出警告的代码
df[condition]["wen_cha"] = df["max_temperature"]-df["min_temperature"]

相当于：df.get(condition).set(wen_cha)，第一步骤的get发出了报警

***链式操作其实是两个步骤，先get后set，get得到的dataframe可能是view也可能是copy，pandas发出警告***

官网文档：
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

核心要诀：pandas的dataframe的修改写操作，只允许在源dataframe上进行，一步到位

### 3、解决方法1

将get+set的两步操作，改成set的一步操作

In [9]:
df.loc[condition, "wen_cha"] = df["max_temperature"]-df["min_temperature"]

In [10]:
df[condition].head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status,wen_cha
424,2018-03-01,周四,11.0,2.0,阴~多云,东北风3-4级,146,轻度,9.0
425,2018-03-02,周五,15.0,3.0,多云,南风1-2级,102,轻度,12.0
426,2018-03-03,周六,20.0,10.0,阴~小雨,南风1-2级,128,轻度,10.0
427,2018-03-04,周日,10.0,1.0,小雨,东北风4-5级,131,轻度,9.0
428,2018-03-05,周一,11.0,1.0,多云,东北风1-2级,72,良,10.0


### 4、解决方法2

如果需要预筛选数据做后续的处理分析，使用copy复制dataframe

In [11]:
df_month3 = df[condition].copy()

In [12]:
df_month3.head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status,wen_cha
424,2018-03-01,周四,11.0,2.0,阴~多云,东北风3-4级,146,轻度,9.0
425,2018-03-02,周五,15.0,3.0,多云,南风1-2级,102,轻度,12.0
426,2018-03-03,周六,20.0,10.0,阴~小雨,南风1-2级,128,轻度,10.0
427,2018-03-04,周日,10.0,1.0,小雨,东北风4-5级,131,轻度,9.0
428,2018-03-05,周一,11.0,1.0,多云,东北风1-2级,72,良,10.0


In [13]:
df_month3["wen_cha"] = df["max_temperature"]-df["min_temperature"]

In [14]:
df_month3.head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status,wen_cha
424,2018-03-01,周四,11.0,2.0,阴~多云,东北风3-4级,146,轻度,9.0
425,2018-03-02,周五,15.0,3.0,多云,南风1-2级,102,轻度,12.0
426,2018-03-03,周六,20.0,10.0,阴~小雨,南风1-2级,128,轻度,10.0
427,2018-03-04,周日,10.0,1.0,小雨,东北风4-5级,131,轻度,9.0
428,2018-03-05,周一,11.0,1.0,多云,东北风1-2级,72,良,10.0


***总之，pandas不允许先筛选子dataframe，再进行修改写入***  
要么使用.loc实现一个步骤直接修改源dataframe  
要么先复制一个子dataframe再一个步骤执行修改