## Pandas怎样新增数据列？
在进行数据分析时，经常需要按照一定条件创建新的数据列，然后进行进一步分析。

1. 直接赋值
2. df.apply方法
3. df.assign方法
4. 按条件选择分组分别赋值

In [2]:
import pandas as pd

### 0、读取excel数据到dataframe

In [3]:
fpath = "./result.xlsx"
df = pd.read_excel(fpath)

In [4]:
df.head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status
0,2017-01-01,周日,-,-,多云,无持续风向微风,372,严重
1,2017-01-02,周一,-,-,霾,无持续风向微风,361,严重
2,2017-01-03,周二,-,-,霾~雾,无持续风向微风,280,重度
3,2017-01-04,周三,9°,2°,小雨,无持续风向微风,193,中度
4,2017-01-05,周四,5°,1°,小雨,无持续风向微风,216,重度


### 1、直接赋值的方法  

实例：清理温度列，变成数字类型

In [5]:
# 替换掉温度的后缀°
df['min_temperature'] = df['min_temperature'].map(
    lambda x: int(x.replace('°', '')) if x != '-' else np.nan)
df['max_temperature'] = df['max_temperature'].map(
    lambda x: int(x.replace('°', '')) if x != '-' else np.nan)

In [6]:
df.head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status
0,2017-01-01,周日,,,多云,无持续风向微风,372,严重
1,2017-01-02,周一,,,霾,无持续风向微风,361,严重
2,2017-01-03,周二,,,霾~雾,无持续风向微风,280,重度
3,2017-01-04,周三,9.0,2.0,小雨,无持续风向微风,193,中度
4,2017-01-05,周四,5.0,1.0,小雨,无持续风向微风,216,重度


实例：计算温差

In [7]:
# 注意，df["max_temperature"]其实是一个Series，后面的减法返回的是Series
df.loc[:, "wencha"] = df["max_temperature"] - df["min_temperature"]

In [8]:
df.head()

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status,wencha
0,2017-01-01,周日,,,多云,无持续风向微风,372,严重,
1,2017-01-02,周一,,,霾,无持续风向微风,361,严重,
2,2017-01-03,周二,,,霾~雾,无持续风向微风,280,重度,
3,2017-01-04,周三,9.0,2.0,小雨,无持续风向微风,193,中度,7.0
4,2017-01-05,周四,5.0,1.0,小雨,无持续风向微风,216,重度,4.0


### 2、df.apply方法

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). 

实例：添加一列温度类型：  
1. 如果最高温度大于33度就是高温
2. 低于-10度是低温
3. 否则是常温

In [9]:
def get_wendu_type(x):
    if x["max_temperature"] > 33:
        return '高温'
    if x["min_temperature"] < -10:
        return '低温'
    return '常温'


# 注意需要设置axis==1，这是series的index是columns
df.loc[:, "wendu_type"] = df.apply(get_wendu_type, axis=1)

In [10]:
# 查看温度类型的计数
df["wendu_type"].value_counts()

常温    945
高温    150
Name: wendu_type, dtype: int64

### 3、df.assign方法

Assign new columns to a DataFrame.

Returns a new object with all original columns in addition to new ones. 

实例：将温度从摄氏度变成华氏度

In [11]:
# 可以同时添加多个新的列
df.assign(
    yWendu_huashi=lambda x: x["max_temperature"] * 9 / 5 + 32,
    # 摄氏度转华氏度
    bWendu_huashi=lambda x: x["min_temperature"] * 9 / 5 + 32
)

Unnamed: 0,date,week,max_temperature,min_temperature,day_status,wind,aqi,aqi_status,wencha,wendu_type,yWendu_huashi,bWendu_huashi
0,2017-01-01,周日,,,多云,无持续风向微风,372,严重,,常温,,
1,2017-01-02,周一,,,霾,无持续风向微风,361,严重,,常温,,
2,2017-01-03,周二,,,霾~雾,无持续风向微风,280,重度,,常温,,
3,2017-01-04,周三,9.0,2.0,小雨,无持续风向微风,193,中度,7.0,常温,48.2,35.6
4,2017-01-05,周四,5.0,1.0,小雨,无持续风向微风,216,重度,4.0,常温,41.0,33.8
...,...,...,...,...,...,...,...,...,...,...,...,...
1090,2019-12-27,周五,14.0,-2.0,晴~多云,西南风2级,55,良,16.0,常温,57.2,28.4
1091,2019-12-28,周六,,,阴~多云,西南风2级,80,良,,常温,,
1092,2019-12-29,周日,,,多云~晴,西南风2级,116,轻度,,常温,,
1093,2019-12-30,周一,5.0,-7.0,晴,东北风3级,131,轻度,12.0,常温,41.0,19.4


### 4、按条件选择分组分别赋值
按条件先选择数据，然后对这部分数据赋值新列  
实例：高低温差大于10度，则认为温差大

In [12]:
# 先创建空列（这是第一种创建新列的方法）
df['wencha_type'] = ''

df.loc[df["max_temperature"]-df["min_temperature"] > 10, "wencha_type"] = "温差大"

df.loc[df["max_temperature"]-df["min_temperature"] <= 10, "wencha_type"] = "温差正常"

In [13]:
df["wencha_type"].value_counts()

温差正常    600
温差大     455
         40
Name: wencha_type, dtype: int64