#### Pandas中描述性统计信息的函数，下表列出了重要函数
- count()	非空观测数量
- sum()	所有值之和
- mean()	所有值的平均值
- median()	所有值的中位数
- mode()	值的模值
- std()	值的标准偏差
- min()	所有值中的最小值
- max()	所有值中的最大值
- abs()	绝对值
- prod()	数组元素的乘积
- cumsum()	累计总和
- cumprod()	累计乘积

#### 有很多方法用来集体计算DataFrame的描述性统计信息和其他相关操作

In [1]:
import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Minsu','Jack',
   'Lee','David','Gasper','Betina','Andres']),
   'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}

#Create a DataFrame
df = pd.DataFrame(d)
print(df)

      Name  Age  Rating
0      Tom   25    4.23
1    James   26    3.24
2    Ricky   25    3.98
3      Vin   23    2.56
4    Steve   30    3.20
5    Minsu   29    4.60
6     Jack   23    3.80
7      Lee   34    3.78
8    David   40    2.98
9   Gasper   30    4.80
10  Betina   51    4.10
11  Andres   46    3.65


##### sum 方法
sum(self, axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)
- 返回所请求轴的值的总和。 默认情况下，轴为索引(axis=0)

In [2]:
print(df.sum())

Name      TomJamesRickyVinSteveMinsuJackLeeDavidGasperBe...
Age                                                     382
Rating                                                44.92
dtype: object


In [3]:
print(df.sum(axis = 1))

0     29.23
1     29.24
2     28.98
3     25.56
4     33.20
5     33.60
6     26.80
7     37.78
8     42.98
9     34.80
10    55.10
11    49.65
dtype: float64


In [4]:
# 列选择并计算
print(df['Age'].sum())

382


In [5]:
# 行选择并计算
print(df.loc[[3,4,5]].sum())

Name      VinSteveMinsu
Age                  82
Rating            10.36
dtype: object


##### mean

In [6]:
print(df.mean())

Name      (3.536659653e-315+3.536659653e-315j)
Age                    (31.833333333333332+0j)
Rating                  (3.743333333333333+0j)
dtype: complex128


##### std()
返回数字列的Bressel标准偏差

In [7]:
print(df.std())

Age       9.232682
Rating    0.661628
dtype: float64


###### 由于DataFrame是异构数据结构。通用操作不适用于所有函数
- 类似于：sum()，cumsum()函数能与数字和字符(或)字符串数据元素一起工作，不会产生任何错误
- 当DataFrame包含字符或字符串数据时，像abs()，cumprod()这样的函数会抛出异常

In [8]:
print(df.cumsum())

                                                 Name  Age Rating
0                                                 Tom   25   4.23
1                                            TomJames   51   7.47
2                                       TomJamesRicky   76  11.45
3                                    TomJamesRickyVin   99  14.01
4                               TomJamesRickyVinSteve  129  17.21
5                          TomJamesRickyVinSteveMinsu  158  21.81
6                      TomJamesRickyVinSteveMinsuJack  181  25.61
7                   TomJamesRickyVinSteveMinsuJackLee  215  29.39
8              TomJamesRickyVinSteveMinsuJackLeeDavid  255  32.37
9        TomJamesRickyVinSteveMinsuJackLeeDavidGasper  285  37.17
10  TomJamesRickyVinSteveMinsuJackLeeDavidGasperBe...  336  41.27
11  TomJamesRickyVinSteveMinsuJackLeeDavidGasperBe...  382  44.92


In [9]:
print(df.abs())

TypeError: bad operand type for abs(): 'str'

#### 汇总数据
describe()函数是用来计算有关DataFrame列的统计信息的摘要
- describe(self, percentiles=None, include=None, exclude=None)
    - include是用于传递关于什么列需要考虑用于总结的必要信息的参数
        - object - 汇总字符串列
        - number - 汇总数字列
        - all - 将所有列汇总在一起(不应将其作为列表值传递)

In [10]:
print(df.describe())

             Age     Rating
count  12.000000  12.000000
mean   31.833333   3.743333
std     9.232682   0.661628
min    23.000000   2.560000
25%    25.000000   3.230000
50%    29.500000   3.790000
75%    35.500000   4.132500
max    51.000000   4.800000


In [11]:
print(df.describe(include=['object']))

          Name
count       12
unique      12
top     Andres
freq         1


In [12]:
print(df.describe(include=['number']))

             Age     Rating
count  12.000000  12.000000
mean   31.833333   3.743333
std     9.232682   0.661628
min    23.000000   2.560000
25%    25.000000   3.230000
50%    29.500000   3.790000
75%    35.500000   4.132500
max    51.000000   4.800000


In [13]:
# 不应将  all  作为列表值传递
print(df.describe(include='all'))

          Name        Age     Rating
count       12  12.000000  12.000000
unique      12        NaN        NaN
top     Andres        NaN        NaN
freq         1        NaN        NaN
mean       NaN  31.833333   3.743333
std        NaN   9.232682   0.661628
min        NaN  23.000000   2.560000
25%        NaN  25.000000   3.230000
50%        NaN  29.500000   3.790000
75%        NaN  35.500000   4.132500
max        NaN  51.000000   4.800000
