# 算术运算

- add(other)

比如进行数学运算加上具体的一个数字

In [1]:
# 引入测试数据
import pandas as pd
# 读取文件
data = pd.read_csv("./stock_day.csv")

# 删除一些列，让数据更简单些，再去做后面的操作
data = data.drop(["ma5","ma10","ma20","v_ma5","v_ma10","v_ma20"],axis=1)

In [2]:
data['open'].head()

2018-02-27    23.53
2018-02-26    22.80
2018-02-23    22.88
2018-02-22    22.25
2018-02-14    21.49
Name: open, dtype: float64

In [3]:
data['open'].add(1).head()

2018-02-27    24.53
2018-02-26    23.80
2018-02-23    23.88
2018-02-22    23.25
2018-02-14    22.49
Name: open, dtype: float64

- sub(other)

# 逻辑运算
## 逻辑运算符号

- 例如筛选data["open"] > 23的日期数据
    - data["open"] > 23返回逻辑结果

In [6]:
(data['open']>23).head()

2018-02-27     True
2018-02-26    False
2018-02-23    False
2018-02-22    False
2018-02-14    False
Name: open, dtype: bool

In [4]:
# 逻辑判断的结果可以作为筛选的依据
data[data['open']>23].head()

Unnamed: 0,open,high,close,low,volume,price_change,p_change,turnover
2018-02-27,23.53,25.88,24.16,23.53,95578.03,0.63,2.68,2.39
2018-02-01,23.71,23.86,22.42,22.22,66414.64,-1.3,-5.48,1.66
2018-01-31,23.85,23.98,23.72,23.31,49155.02,-0.11,-0.46,1.23
2018-01-30,23.71,24.08,23.83,23.7,32420.43,0.05,0.21,0.81
2018-01-29,24.4,24.63,23.77,23.72,65469.81,-0.73,-2.98,1.64


In [6]:
# 完成多个逻辑判断
data[(data['open'] > 23) & (data['open'] < 24)].head()

Unnamed: 0,open,high,close,low,volume,price_change,p_change,turnover
2018-02-27,23.53,25.88,24.16,23.53,95578.03,0.63,2.68,2.39
2018-02-01,23.71,23.86,22.42,22.22,66414.64,-1.3,-5.48,1.66
2018-01-31,23.85,23.98,23.72,23.31,49155.02,-0.11,-0.46,1.23
2018-01-30,23.71,24.08,23.83,23.7,32420.43,0.05,0.21,0.81
2018-01-16,23.4,24.6,24.4,23.3,101295.42,0.96,4.1,2.54


## 逻辑运算函数
- query(expr)
    - expr:查询字符串

通过query使得刚才的过程更加方便简单

In [2]:
data.query("open<24 & open>23").head()

Unnamed: 0,open,high,close,low,volume,price_change,p_change,turnover
2018-02-27,23.53,25.88,24.16,23.53,95578.03,0.63,2.68,2.39
2018-02-01,23.71,23.86,22.42,22.22,66414.64,-1.3,-5.48,1.66
2018-01-31,23.85,23.98,23.72,23.31,49155.02,-0.11,-0.46,1.23
2018-01-30,23.71,24.08,23.83,23.7,32420.43,0.05,0.21,0.81
2018-01-16,23.4,24.6,24.4,23.3,101295.42,0.96,4.1,2.54


- isin(values)

例如判断'open'是否为23.53和23.85

In [3]:
# 可以指定值进行一个判断，从而进行筛选操作
data[data["open"].isin([23.53,23.85])]

Unnamed: 0,open,high,close,low,volume,price_change,p_change,turnover
2018-02-27,23.53,25.88,24.16,23.53,95578.03,0.63,2.68,2.39
2018-01-31,23.85,23.98,23.72,23.31,49155.02,-0.11,-0.46,1.23
2017-07-26,23.53,23.92,23.4,22.85,110276.48,-0.3,-1.27,2.76
2015-12-18,23.53,24.66,23.99,23.43,109230.05,0.65,2.79,3.74
2015-11-26,23.85,24.08,23.53,23.5,51446.29,-0.31,-1.3,1.76


# 统计运算
## describe

综合分析: 能够直接得出很多统计结果,count, mean, std, min, max 等

In [7]:
# 计算平均值、标准差、最大值、最小值
data.describe()

Unnamed: 0,open,high,close,low,volume,price_change,p_change,turnover
count,643.0,643.0,643.0,643.0,643.0,643.0,643.0,643.0
mean,21.272706,21.900513,21.336267,20.771835,99905.519114,0.018802,0.19028,2.93619
std,3.930973,4.077578,3.942806,3.791968,73879.119354,0.898476,4.079698,2.079375
min,12.25,12.67,12.36,12.2,1158.12,-3.52,-10.03,0.04
25%,19.0,19.5,19.045,18.525,48533.21,-0.39,-1.85,1.36
50%,21.44,21.97,21.45,20.98,83175.93,0.05,0.26,2.5
75%,23.4,24.065,23.415,22.85,127580.055,0.455,2.305,3.915
max,34.99,36.35,35.21,34.01,501915.41,3.03,10.03,12.56


## 统计函数

Numpy当中已经详细介绍，在这里我们演示min(最小值), max(最大值), mean(平均值), median(中位数), var(方差), std(标准差),mode(众数)结果:

| count | Number of non-NA observations |
|:---|:---|
| sum    | Sum of values |
| mean   | Mean of values |
| median | Arithmetic median of values |
| min    | Minimum |
| max    | Maximum |
| mode   | Mode |
| abs    | Absolute Value |
| prod   | Product of values |
| std    | Bessel-corrected sample standard deviation |
| var    | Unbiased variance |
| idxmax | compute the index labels with the maximum |
| idxmin | compute the index labels with the minimum |

对于单个函数去进行统计的时候，坐标轴还是按照默认列“columns” (axis=0, default)，如果要对行“index” 需要指定(axis=1)

- max()、min()


In [8]:
# 使用统计函数：0 代表列求结果， 1 代表行求统计结果
data.max(0)

open                34.99
high                36.35
close               35.21
low                 34.01
volume          501915.41
price_change         3.03
p_change            10.03
turnover            12.56
dtype: float64

In [10]:
data.max(1).head()

2018-02-27    95578.03
2018-02-26    60985.11
2018-02-23    52914.01
2018-02-22    36105.01
2018-02-14    23331.04
dtype: float64

- std()、var()

In [11]:
# 方差
data.var(0)

open            1.545255e+01
high            1.662665e+01
close           1.554572e+01
low             1.437902e+01
volume          5.458124e+09
price_change    8.072595e-01
p_change        1.664394e+01
turnover        4.323800e+00
dtype: float64

In [12]:
# 标准差
data.std(0)

open                3.930973
high                4.077578
close               3.942806
low                 3.791968
volume          73879.119354
price_change        0.898476
p_change            4.079698
turnover            2.079375
dtype: float64

- median()：中位数

中位数为将数据从小到大排列，在最中间的那个数为中位数。如果没有中间数，取中间两个数的平均值。

In [15]:
df = pd.DataFrame({'COL1':[2,3,4,5,4,2],
                  'COL2':[0,1,2,3,4,2]})
df

Unnamed: 0,COL1,COL2
0,2,0
1,3,1
2,4,2
3,5,3
4,4,4
5,2,2


In [16]:
df.median(0)

COL1    3.5
COL2    2.0
dtype: float64

- idxmax()、idxmin()

In [18]:
# 求出最大值的位置
data.idxmax(axis=0)

open            2015-06-15
high            2015-06-10
close           2015-06-12
low             2015-06-12
volume          2017-10-26
price_change    2015-06-09
p_change        2015-08-28
turnover        2017-10-26
dtype: object

In [19]:
# 求出最小值的位置
data.idxmin(axis=0)

open            2015-03-02
high            2015-03-02
close           2015-09-02
low             2015-03-02
volume          2016-07-06
price_change    2015-06-15
p_change        2015-09-01
turnover        2016-07-06
dtype: object