# 一、Pandas 库

### Pandas 是基于NumPy 的一种工具，该工具是为了解决数据分析任务而创建的。
### 但是与NumPy不同，Pandas更适合处理表格型或异质性数据（NumPy更适合处理同质型的
### 数值类数组数据），并提供了大量数学函数及计算方法。

In [1]:
import numpy as np
import pandas as pd

# 二、Pandas 库数据结构——Series, DataFrame

## 1. Series——索引index，值values

In [2]:
a = pd.Series([1, 2, 3, 4, 5])

In [3]:
a

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [4]:
a.index

RangeIndex(start=0, stop=5, step=1)

In [5]:
a.values

array([1, 2, 3, 4, 5], dtype=int64)

In [6]:
a = pd.Series([1, 2, 3, 4, 5], index = ['a', 'b', 'c', 'd', 'e'])
a

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [7]:
a.index

Index(['a', 'b', 'c', 'd', 'e'], dtype='object')

In [8]:
a_reindex = pd.Series(a, index = ['e', 'b', 'c', 'd', 'a']) 
a_reindex

e    5
b    2
c    3
d    4
a    1
dtype: int64

In [9]:
a_reindex = pd.Series(a, index = ['e', 'b', 'c', 'd', 'a','f', 'g']) 
a_reindex

e    5.0
b    2.0
c    3.0
d    4.0
a    1.0
f    NaN
g    NaN
dtype: float64

In [10]:
a

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [11]:
a.reindex(['e', 'b', 'c', 'd', 'a','f', 'g'])

e    5.0
b    2.0
c    3.0
d    4.0
a    1.0
f    NaN
g    NaN
dtype: float64

In [12]:
a

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [13]:
a.rename(index={'a':'h','b':'i','c':'j','d':'k','e':'l'})

h    1
i    2
j    3
k    4
l    5
dtype: int64

In [14]:
a

a    1
b    2
c    3
d    4
e    5
dtype: int64

In [15]:
a.index = ['e', 'b', 'c', 'd', 'a']
a

e    1
b    2
c    3
d    4
a    5
dtype: int64

In [16]:
a.index = ['e', 'b', 'c', 'd', 'a','f','g']
a

ValueError: Length mismatch: Expected axis has 5 elements, new values have 7 elements

In [17]:
b = np.array(a)
b

array([1, 2, 3, 4, 5], dtype=int64)

In [18]:
c = pd.Series(b)
c

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [19]:
data = {'yuwen': 80, 'yingyu': 90, 'shuxue': 80}

In [20]:
data_ = pd.Series(data,index = ['yingyu','yuwen','shuxue'])

In [21]:
data_

yingyu    90
yuwen     80
shuxue    80
dtype: int64

In [22]:
data_.index

Index(['yingyu', 'yuwen', 'shuxue'], dtype='object')

In [23]:
data_.name = 'Score'
data_.index.name = 'Course'
data_

Course
yingyu    90
yuwen     80
shuxue    80
Name: Score, dtype: int64

In [24]:
data_.index

Index(['yingyu', 'yuwen', 'shuxue'], dtype='object', name='Course')

## 2. DataFrame——索引index, columns，值values

In [25]:
data = np.array([[95, 96, 97], [80, 85, 86], [56, 65, 70]])
data

array([[95, 96, 97],
       [80, 85, 86],
       [56, 65, 70]])

In [26]:
frame = pd.DataFrame(data)

In [27]:
frame

Unnamed: 0,0,1,2
0,95,96,97
1,80,85,86
2,56,65,70


In [28]:
frame = pd.DataFrame(data, index=['xiaoming', 'xiaohong', 'xiaohei'],
                      columns=['yuwen', 'yingyu', 'shuxue'])

In [29]:
frame

Unnamed: 0,yuwen,yingyu,shuxue
xiaoming,95,96,97
xiaohong,80,85,86
xiaohei,56,65,70


In [30]:
frame_ = pd.DataFrame(frame, index=[ 'xiaohong', 'xiaoming','xiaohei'],
                      columns=['yingyu','yuwen',  'shuxue'])

In [31]:
frame_

Unnamed: 0,yingyu,yuwen,shuxue
xiaohong,85,80,86
xiaoming,96,95,97
xiaohei,65,56,70


In [32]:
frame__ = pd.DataFrame(frame, index=[ 'xiaohong', 'xiaoming','xiaohei','xiaobai'],
                      columns=['yingyu','yuwen', 'shuxue', 'tiyu'])

In [33]:
frame__

Unnamed: 0,yingyu,yuwen,shuxue,tiyu
xiaohong,85.0,80.0,86.0,
xiaoming,96.0,95.0,97.0,
xiaohei,65.0,56.0,70.0,
xiaobai,,,,


In [34]:
frame_.reindex(index=[ 'xiaohong', 'xiaoming','xiaohei','xiaobai'],
               columns=['yingyu','yuwen', 'shuxue', 'tiyu'])

Unnamed: 0,yingyu,yuwen,shuxue,tiyu
xiaohong,85.0,80.0,86.0,
xiaoming,96.0,95.0,97.0,
xiaohei,65.0,56.0,70.0,
xiaobai,,,,


In [35]:
frame_

Unnamed: 0,yingyu,yuwen,shuxue
xiaohong,85,80,86
xiaoming,96,95,97
xiaohei,65,56,70


In [36]:
frame_.rename(index={"xiaohong":"damao","xiaoming":"ermao","xiaohei":"Nicolas Cage"},
              columns={"yingyu":"English", "yuwen":"Literature", "shuxue":"Maths"})

Unnamed: 0,English,Literature,Maths
damao,85,80,86
ermao,96,95,97
Nicolas Cage,65,56,70


In [37]:
frame_

Unnamed: 0,yingyu,yuwen,shuxue
xiaohong,85,80,86
xiaoming,96,95,97
xiaohei,65,56,70


In [38]:
frame_.index = ['damao','ermao','Nicolas Cage']
frame_.columns = ['English', 'Literature', 'Maths']
frame_

Unnamed: 0,English,Literature,Maths
damao,85,80,86
ermao,96,95,97
Nicolas Cage,65,56,70


In [39]:
data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"]}

In [40]:
df = pd.DataFrame(data)
df

Unnamed: 0,English,Literature,Maths,Music
0,80,70,80,A
1,70,70,90,B
2,60,85,50,C


In [41]:
df = pd.DataFrame(data, index = ["alpha", "beta","theta"])
df

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C


In [42]:
df.index

Index(['alpha', 'beta', 'theta'], dtype='object')

In [43]:
df.columns

Index(['English', 'Literature', 'Maths', 'Music'], dtype='object')

In [44]:
df.name = 'Score'
df.index.name = 'Person'
df.columns.name = 'Course'
df

Course,English,Literature,Maths,Music
Person,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C


## 小结：

1.Series, DataFrame 结构

2.指定或修改索引方法

index,columns  指定索引，已经有索引可以按索引重新排序

reindex  通过reindex方法，重新建立索引或排序

rename  修改索引

    Series.index = []

    DataFrame.columns = []

In [45]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, alpha to theta
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   English     3 non-null      int64 
 1   Literature  3 non-null      int64 
 2   Maths       3 non-null      int64 
 3   Music       3 non-null      object
dtypes: int64(3), object(1)
memory usage: 120.0+ bytes


### 备注：

1. 元组，一种固定长度的，不可变的python对象序列

2. 列表，长度可变的，内容可修改的序列

3. ndarray，高效多维同类数据容器，提供便捷的算数操作及广播功能

4. Dataframe, 异质性矩阵表，每一列（columns）可以是不同的值类型

# 三、Series, DataFrame  运算

## 1. 基本运算

In [46]:
s1 = pd.Series([1, 2, 3],
              index = ['a','b','c'])
s1

a    1
b    2
c    3
dtype: int64

In [47]:
s1 - 1

a    0
b    1
c    2
dtype: int64

In [48]:
s2 = pd.Series([4, 5, 6],
              index = ['b','c','e'])
s2

b    4
c    5
e    6
dtype: int64

In [49]:
s1 + s2

a    NaN
b    6.0
c    8.0
e    NaN
dtype: float64

In [50]:
data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C


In [51]:
df * 2

Unnamed: 0,English,Literature,Maths,Music
alpha,160,140,160,AA
beta,140,140,180,BB
theta,120,170,100,CC


In [52]:
data1 = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],}
df1 = pd.DataFrame(data1,index = ["alpha", "beta","theta"])
df1

Unnamed: 0,English,Literature,Maths
alpha,80,70,80
beta,70,70,90
theta,60,85,50


In [53]:
df + df1

Unnamed: 0,English,Literature,Maths,Music
alpha,160,140,160,
beta,140,140,180,
theta,120,170,100,


In [54]:
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_

Maths         10
English       10
Literature    20
Gym            A
dtype: object

In [55]:
df1 + add_

Unnamed: 0,English,Gym,Literature,Maths
alpha,90,,90,90
beta,80,,90,100
theta,70,,105,60


In [56]:
add1_ = {'alpha':10,'beta':10,'theta':20,}
add1_ = pd.Series(add1_)
add1_

alpha    10
beta     10
theta    20
dtype: int64

In [57]:
df1+add1_

Unnamed: 0,English,Literature,Maths,alpha,beta,theta
alpha,,,,,,
beta,,,,,,
theta,,,,,,


In [58]:
df1.add(add1_,axis='index')

Unnamed: 0,English,Literature,Maths
alpha,90,80,90
beta,80,80,100
theta,80,105,70


## 2. 矩阵运算、通用函数运算

In [59]:
df

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C


In [60]:
df.T

Unnamed: 0,alpha,beta,theta
English,80,70,60
Literature,70,70,85
Maths,80,90,50
Music,A,B,C


In [61]:
np.square(df)

TypeError: can't multiply sequence by non-int of type 'str'

In [62]:
np.square(df1)

Unnamed: 0,English,Literature,Maths
alpha,6400,4900,6400
beta,4900,4900,8100
theta,3600,7225,2500


## 3. 基本统计方法

In [63]:
df.max(axis=0)

English       80
Literature    85
Maths         90
Music          C
dtype: object

In [64]:
df.mean(axis=1)

alpha    76.666667
beta     76.666667
theta    65.000000
dtype: float64

In [65]:
df.describe()

Unnamed: 0,English,Literature,Maths
count,3.0,3.0,3.0
mean,70.0,75.0,73.333333
std,10.0,8.660254,20.81666
min,60.0,70.0,50.0
25%,65.0,70.0,65.0
50%,70.0,70.0,80.0
75%,75.0,77.5,85.0
max,80.0,85.0,90.0


# 四、Series, DataFrame 索引与切片

## 1. Series 索引

In [66]:
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_

Maths         10
English       10
Literature    20
Gym            A
dtype: object

In [67]:
add_['Maths']

10

In [68]:
add_['Maths':'Literature']

Maths         10
English       10
Literature    20
dtype: object

In [69]:
add_[2]

20

In [70]:
add_[:2]

Maths      10
English    10
dtype: object

In [71]:
add_[[True, False, True, False]]

Maths         10
Literature    20
dtype: object

In [72]:
add_.Maths

10

## 2. DataFrame 索引

### 2.1 通过索引名称进行索引

In [73]:
df

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C


In [74]:
df['Maths']

alpha    80
beta     90
theta    50
Name: Maths, dtype: int64

In [75]:
df[['Maths','English']]

Unnamed: 0,Maths,English
alpha,80,80
beta,90,70
theta,50,60


In [76]:
df.Maths

alpha    80
beta     90
theta    50
Name: Maths, dtype: int64

In [77]:
df['alpha']

KeyError: 'alpha'

In [78]:
df.loc['alpha']

English       80
Literature    70
Maths         80
Music          A
Name: alpha, dtype: object

In [79]:
df.loc['alpha':'theta']

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C


In [80]:
df.alpha

AttributeError: 'DataFrame' object has no attribute 'alpha'

### 2.2 通过数字进行索引

In [81]:
df.iloc[2]

English       60
Literature    85
Maths         50
Music          C
Name: theta, dtype: object

In [82]:
df.iloc[:,2]

alpha    80
beta     90
theta    50
Name: Maths, dtype: int64

In [83]:
df.iloc[:2,:2]

Unnamed: 0,English,Literature
alpha,80,70
beta,70,70


In [84]:
df[:2]

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B


In [85]:
df[2]

KeyError: 2

In [86]:
df[:2,:2]

TypeError: '(slice(None, 2, None), slice(None, 2, None))' is an invalid key

### 2.3 通过布尔值索引

In [87]:
df1 >70

Unnamed: 0,English,Literature,Maths
alpha,True,False,True
beta,False,False,True
theta,False,True,False


In [88]:
df1[df1>70]

Unnamed: 0,English,Literature,Maths
alpha,80.0,,80.0
beta,,,90.0
theta,,85.0,


In [89]:
df1[df1['Maths']>70]

Unnamed: 0,English,Literature,Maths
alpha,80,70,80
beta,70,70,90


In [90]:
df1[df1['Maths']>70] = 70

In [91]:
df1

Unnamed: 0,English,Literature,Maths
alpha,70,70,70
beta,70,70,70
theta,60,85,50


# 五、Series, DataFrame 删除操作

## 1. Series 删除操作

In [92]:
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_

Maths         10
English       10
Literature    20
Gym            A
dtype: object

In [93]:
add_.pop('Maths')

10

In [94]:
add_

English       10
Literature    20
Gym            A
dtype: object

In [95]:
add_ = {'Maths':10,'English':10,'Literature':20,'Gym':"A"}
add_ = pd.Series(add_)
add_

Maths         10
English       10
Literature    20
Gym            A
dtype: object

In [96]:
add_.drop('Maths')

English       10
Literature    20
Gym            A
dtype: object

In [97]:
add_

Maths         10
English       10
Literature    20
Gym            A
dtype: object

In [98]:
add_.drop('Maths',inplace=True)

In [99]:
add_

English       10
Literature    20
Gym            A
dtype: object

In [100]:
del add_['English']

In [101]:
add_

Literature    20
Gym            A
dtype: object

## 2. DataFrame 的删除

In [102]:
data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C


In [103]:
df.pop("Music")

alpha    A
beta     B
theta    C
Name: Music, dtype: object

In [104]:
df

Unnamed: 0,English,Literature,Maths
alpha,80,70,80
beta,70,70,90
theta,60,85,50


In [105]:
df.drop('alpha')

Unnamed: 0,English,Literature,Maths
beta,70,70,90
theta,60,85,50


In [106]:
df.drop('Maths',axis=1)

Unnamed: 0,English,Literature
alpha,80,70
beta,70,70
theta,60,85


In [107]:
df

Unnamed: 0,English,Literature,Maths
alpha,80,70,80
beta,70,70,90
theta,60,85,50


In [108]:
del df['Maths']

In [109]:
df

Unnamed: 0,English,Literature
alpha,80,70
beta,70,70
theta,60,85


In [110]:
del df.loc['alpha']

AttributeError: __delitem__

# 六. Series, DataFrame 合并操作

## 1. Series 合并操作

In [111]:
s1 = pd.Series([1, 2, 3],
              index = ['a','b','c'])
s1

a    1
b    2
c    3
dtype: int64

In [112]:
s2 = pd.Series([4, 5, 6],
              index = ['b','c','e'])
s2

b    4
c    5
e    6
dtype: int64

In [113]:
pd.concat((s1,s2))

a    1
b    2
c    3
b    4
c    5
e    6
dtype: int64

In [114]:
pd.concat((s1,s2),axis =1)

Unnamed: 0,0,1
a,1.0,
b,2.0,4.0
c,3.0,5.0
e,,6.0


In [115]:
s1.combine_first(s2)

a    1.0
b    2.0
c    3.0
e    6.0
dtype: float64

## 2. DataFrame 合并操作

In [116]:
data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"]}
df = pd.DataFrame(data,index = ["alpha", "beta","theta"])
df

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C


In [117]:
data1 = {"English":[80,70,60], 
        "Maths":[80,90,50],
        "Literature":[70,70,85],}
df1 = pd.DataFrame(data1,index = ["beta","alpha","theta"])
df1

Unnamed: 0,English,Maths,Literature
beta,80,80,70
alpha,70,90,70
theta,60,50,85


In [118]:
pd.concat((df,df1))

Unnamed: 0,English,Literature,Maths,Music
alpha,80,70,80,A
beta,70,70,90,B
theta,60,85,50,C
beta,80,70,80,
alpha,70,70,90,
theta,60,85,50,


In [119]:
pd.concat((df,df1),axis=1)

Unnamed: 0,English,Literature,Maths,Music,English.1,Maths.1,Literature.1
alpha,80,70,80,A,70,90,70
beta,70,70,90,B,80,80,70
theta,60,85,50,C,60,50,85


In [120]:
df1.combine_first(df)

Unnamed: 0,English,Literature,Maths,Music
alpha,70,70,90,A
beta,80,70,80,B
theta,60,85,50,C


In [121]:
data = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "Music":["A","B","C"],
        "ID":[1001,1002,1003]}
df = pd.DataFrame(data)
df

Unnamed: 0,English,Literature,Maths,Music,ID
0,80,70,80,A,1001
1,70,70,90,B,1002
2,60,85,50,C,1003


In [122]:
data1 = {"English":[80,70,60], 
        "Literature":[70,70,85],
        "Maths":[80,90,50],
        "ID":[1004,1002,1003]}
df1 = pd.DataFrame(data1)
df1

Unnamed: 0,English,Literature,Maths,ID
0,80,70,80,1004
1,70,70,90,1002
2,60,85,50,1003


In [123]:
pd.concat((df,df1),axis=1)

Unnamed: 0,English,Literature,Maths,Music,ID,English.1,Literature.1,Maths.1,ID.1
0,80,70,80,A,1001,80,70,80,1004
1,70,70,90,B,1002,70,70,90,1002
2,60,85,50,C,1003,60,85,50,1003


In [124]:
pd.merge(df,df1,on='ID')

Unnamed: 0,English_x,Literature_x,Maths_x,Music,ID,English_y,Literature_y,Maths_y
0,70,70,90,B,1002,70,70,90
1,60,85,50,C,1003,60,85,50


In [125]:
pd.merge(df,df1,on='ID',how="outer")

Unnamed: 0,English_x,Literature_x,Maths_x,Music,ID,English_y,Literature_y,Maths_y
0,80.0,70.0,80.0,A,1001,,,
1,70.0,70.0,90.0,B,1002,70.0,70.0,90.0
2,60.0,85.0,50.0,C,1003,60.0,85.0,50.0
3,,,,,1004,80.0,70.0,80.0


In [126]:
df.set_index('ID', inplace=True)
df

Unnamed: 0_level_0,English,Literature,Maths,Music
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,80,70,80,A
1002,70,70,90,B
1003,60,85,50,C


In [127]:
df1.set_index('ID', inplace=True)
df1

Unnamed: 0_level_0,English,Literature,Maths
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1004,80,70,80
1002,70,70,90
1003,60,85,50


In [128]:
df.join(df1, how='outer', lsuffix='df', rsuffix='df1')

Unnamed: 0_level_0,Englishdf,Literaturedf,Mathsdf,Music,Englishdf1,Literaturedf1,Mathsdf1
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1001,80.0,70.0,80.0,A,,,
1002,70.0,70.0,90.0,B,70.0,70.0,90.0
1003,60.0,85.0,50.0,C,60.0,85.0,50.0
1004,,,,,80.0,70.0,80.0


# 七. Pandas 库其他常用函数或方法

In [129]:
df3 = pd.concat((df,df1),axis=0)
df3

Unnamed: 0_level_0,English,Literature,Maths,Music
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,80,70,80,A
1002,70,70,90,B
1003,60,85,50,C
1004,80,70,80,
1002,70,70,90,
1003,60,85,50,


In [130]:
df3.head()

Unnamed: 0_level_0,English,Literature,Maths,Music
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,80,70,80,A
1002,70,70,90,B
1003,60,85,50,C
1004,80,70,80,
1002,70,70,90,


In [131]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 1001 to 1003
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   English     6 non-null      int64 
 1   Literature  6 non-null      int64 
 2   Maths       6 non-null      int64 
 3   Music       3 non-null      object
dtypes: int64(3), object(1)
memory usage: 240.0+ bytes


In [132]:
df3.describe()

Unnamed: 0,English,Literature,Maths
count,6.0,6.0,6.0
mean,70.0,75.0,73.333333
std,8.944272,7.745967,18.618987
min,60.0,70.0,50.0
25%,62.5,70.0,57.5
50%,70.0,70.0,80.0
75%,77.5,81.25,87.5
max,80.0,85.0,90.0


In [133]:
df3.sort_index(axis=0)

Unnamed: 0_level_0,English,Literature,Maths,Music
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,80,70,80,A
1002,70,70,90,B
1002,70,70,90,
1003,60,85,50,C
1003,60,85,50,
1004,80,70,80,


In [134]:
df3.sort_values(by=['Maths'])

Unnamed: 0_level_0,English,Literature,Maths,Music
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1003,60,85,50,C
1003,60,85,50,
1001,80,70,80,A
1004,80,70,80,
1002,70,70,90,B
1002,70,70,90,


In [135]:
df3.index.is_unique

False

In [136]:
df3['English'].is_unique

False

In [137]:
df3.index.value_counts()

1003    2
1002    2
1004    1
1001    1
Name: ID, dtype: int64

In [138]:
df3.Music.value_counts()

B    1
C    1
A    1
Name: Music, dtype: int64

In [139]:
df3['Maths'].rank()

ID
1001    3.5
1002    5.5
1003    1.5
1004    3.5
1002    5.5
1003    1.5
Name: Maths, dtype: float64

In [140]:
df3['Maths'].rank(method = 'first')

ID
1001    3.0
1002    5.0
1003    1.0
1004    4.0
1002    6.0
1003    2.0
Name: Maths, dtype: float64

## 总结
一、Pandas库

二、Pandas库数据结构——Series, DataFrame

1.Series——索引 index，值 values

2.DataFrame——索引index, columns，值 values
	
指定或修改索引方法

创建时：index, columns 指定索引，已经有索引可以按索引重新排序

创建后：

reindex方法，重新建立索引或指定索引排序

rename 修改索引
        
Series.index = []
DataFrame.columns = []

三、Series, DataFrame运算

1.基本运算	

按照索引位置进行计算

DataFrame、Series “相加”时，按照DF的columns进行匹配
            
2.矩阵运算、通用函数

3.基本统计方法	axis指定操作轴

四、Series, DataFrame 索引与切片

1.Series 索引与切片	Index索引/数字索引/布尔值索引

2.DataFrame 索引与切片

	Index索引	列：df['Maths']	行：df.loc[‘alpha’]
    
	数字索引	df.iloc[]	特别的行可以直接用数字切片索引
    
	布尔值索引
    
五、Series, DataFrame 删除操作

1.Series删除操作	pop/drop/del

2.DataFrame删除操作	pop/drop/del

六、Series, DataFrame 合并操作

1.Series合并操作

    pd.concat()	combine_first()

2.DataFrame合并操作	

    pd.concat()	combine_first()	

    pd.merge()	join()
                    
七、Pandas库其他常用函数或方法

    head()	info()	describe()	

    sort_index()	sort_values()

    is_unique	value_counts()

    rank()
