# Pandas 怎样新增数据列  

在进行数据分析时，经常需要按照一定条件创建新的数据列，然后进行进一步分析。

1. 直接赋值  
2. df.apply 方法  
3. df.assign 方法
4. 按条件选择分组分别赋值

In [2]:
import pandas as pd

## 0. 读取csv数据到dataframe

In [3]:
fpath="./datas/beijing_tianqi/beijing_tianqi_2018.csv"
df = pd.read_csv(fpath)

In [5]:
df.head()

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel
0,2018-01-01,3℃,-6℃,晴~多云,东北风,1-2级,59,良,2
1,2018-01-02,2℃,-5℃,阴~多云,东北风,1-2级,49,优,1
2,2018-01-03,2℃,-5℃,多云,北风,1-2级,28,优,1
3,2018-01-04,0℃,-8℃,阴,东北风,1-2级,28,优,1
4,2018-01-05,3℃,-6℃,多云~晴,西北风,1-2级,50,优,1


## 1. 直接赋值的方法  

实例：清理温度列，变成数字类型

In [25]:
#替换掉温度的后缀℃
df['bWendu'] = df['bWendu'].str.replace("℃","").astype('int32')
df['yWendu'] = df['yWendu'].str.replace("℃","").astype('int32')

In [26]:
df

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,dif_temp
0,2018-01-01,3,-6,晴~多云,东北风,1-2级,59,良,2,9
1,2018-01-02,2,-5,阴~多云,东北风,1-2级,49,优,1,7
2,2018-01-03,2,-5,多云,北风,1-2级,28,优,1,7
3,2018-01-04,0,-8,阴,东北风,1-2级,28,优,1,8
4,2018-01-05,3,-6,多云~晴,西北风,1-2级,50,优,1,9
...,...,...,...,...,...,...,...,...,...,...
360,2018-12-27,-5,-12,多云~晴,西北风,3级,48,优,1,7
361,2018-12-28,-3,-11,晴,西北风,3级,40,优,1,8
362,2018-12-29,-3,-12,晴,西北风,2级,29,优,1,9
363,2018-12-30,-2,-11,晴~多云,东北风,1级,31,优,1,9


实例： 计算温差

In [27]:
#注意： df['bWendu'] 其实是一个Series，后面的减法返回的是Series
df['dif_temp'] = df['bWendu']-df['yWendu']

In [28]:
df.head()

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,dif_temp
0,2018-01-01,3,-6,晴~多云,东北风,1-2级,59,良,2,9
1,2018-01-02,2,-5,阴~多云,东北风,1-2级,49,优,1,7
2,2018-01-03,2,-5,多云,北风,1-2级,28,优,1,7
3,2018-01-04,0,-8,阴,东北风,1-2级,28,优,1,8
4,2018-01-05,3,-6,多云~晴,西北风,1-2级,50,优,1,9


## 2. df.apply方法  

Apply a function along an axis of the DataFrame

Objects passed to the function are Series objects whose index is either the DataFrme's index(axis=0) or the DataFrame's columns(axis=1)  

实例： 添加一列温度类型：  
1. 如果最高温度大于33度就是高温
2. 最低温度低于-10度是低温
3. 否则是常温

In [43]:
def label_temperature_level(df):
    if df['bWendu'] > 33:
        return 'High'
    elif df['yWendu'] < -10:
        return 'Low'
    else:
        return 'Regular'
        
df['tmp_level'] = df.apply(label_temperature_level,axis=1) # 因为这里用到的index ‘bWendu’， ‘yWendu’都是dataframe的column，所以axis=1

In [45]:
df

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,dif_temp,tmp_level
0,2018-01-01,3,-6,晴~多云,东北风,1-2级,59,良,2,9,Regular
1,2018-01-02,2,-5,阴~多云,东北风,1-2级,49,优,1,7,Regular
2,2018-01-03,2,-5,多云,北风,1-2级,28,优,1,7,Regular
3,2018-01-04,0,-8,阴,东北风,1-2级,28,优,1,8,Regular
4,2018-01-05,3,-6,多云~晴,西北风,1-2级,50,优,1,9,Regular
...,...,...,...,...,...,...,...,...,...,...,...
360,2018-12-27,-5,-12,多云~晴,西北风,3级,48,优,1,7,Low
361,2018-12-28,-3,-11,晴,西北风,3级,40,优,1,8,Low
362,2018-12-29,-3,-12,晴,西北风,2级,29,优,1,9,Low
363,2018-12-30,-2,-11,晴~多云,东北风,1级,31,优,1,9,Low


In [50]:
#查看温度类型的计数
df['tmp_level'].value_counts()#value_counts() 可以将不同value的值的出现频率计数

Regular    328
High        29
Low          8
Name: tmp_level, dtype: int64

## 3. df.assign方法  

Assign new columns to a DataFrame.   

Returns a new object with all original columns in addition to new ones.  

实例：将温度从摄氏度变成华氏度  

In [53]:
#可以同时添加多个新的列
df.assign(yWendu_fahren = lambda x : x['yWendu'] * 9 / 5 + 32, bWendu_fahren = lambda x: x['bWendu'] * 9 / 5 + 32)

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,dif_temp,tmp_level,yWendu_fahren,bWendu_fahren
0,2018-01-01,3,-6,晴~多云,东北风,1-2级,59,良,2,9,Regular,21.2,37.4
1,2018-01-02,2,-5,阴~多云,东北风,1-2级,49,优,1,7,Regular,23.0,35.6
2,2018-01-03,2,-5,多云,北风,1-2级,28,优,1,7,Regular,23.0,35.6
3,2018-01-04,0,-8,阴,东北风,1-2级,28,优,1,8,Regular,17.6,32.0
4,2018-01-05,3,-6,多云~晴,西北风,1-2级,50,优,1,9,Regular,21.2,37.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...
360,2018-12-27,-5,-12,多云~晴,西北风,3级,48,优,1,7,Low,10.4,23.0
361,2018-12-28,-3,-11,晴,西北风,3级,40,优,1,8,Low,12.2,26.6
362,2018-12-29,-3,-12,晴,西北风,2级,29,优,1,9,Low,10.4,26.6
363,2018-12-30,-2,-11,晴~多云,东北风,1级,31,优,1,9,Low,12.2,28.4


## 4. 按条件选择分组分别赋值  

按条件先选择数据，然后对这部分数据赋值新列  
实例： 高低温差大于10度，则认为温差大  

In [54]:
#先创建空列（这是一种创建新列的方法）  
df['dif_tmp_level'] =''

In [56]:
df.head()

Unnamed: 0,ymd,bWendu,yWendu,tianqi,fengxiang,fengli,aqi,aqiInfo,aqiLevel,dif_temp,tmp_level,dif_tmp_level
0,2018-01-01,3,-6,晴~多云,东北风,1-2级,59,良,2,9,Regular,
1,2018-01-02,2,-5,阴~多云,东北风,1-2级,49,优,1,7,Regular,
2,2018-01-03,2,-5,多云,北风,1-2级,28,优,1,7,Regular,
3,2018-01-04,0,-8,阴,东北风,1-2级,28,优,1,8,Regular,
4,2018-01-05,3,-6,多云~晴,西北风,1-2级,50,优,1,9,Regular,


In [71]:
df.loc[df['dif_temp']>10,'dif_tmp_level'] = "large"
df.loc[df['dif_temp']<=10,'dif_tmp_level'] = "small" #选择满足df['dif_temp']<=10的所有行的'dif_tmp_level'

In [72]:
df['dif_tmp_level'].value_counts()

small    187
large    178
Name: dif_tmp_level, dtype: int64