## 1. 层次化索引的选取、切片
建议使用`loc[]`来进行选取
- 对于Series，第一个参数为外层索引，第二个参数为内层索引
- 对于DataFrame，使用元组来确定某一行/列，如`loc[(外层标签,内层标签),列标签]`
- 各层索引都可以有名字

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = pd.Series(
    np.random.randn(9),
    index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'],
           list('hjkhjklkj')])
data

a  h   -0.411002
   j    1.322214
   k    0.117585
b  h    0.227043
   j    0.020601
c  k   -0.158537
   l   -0.413789
d  k   -0.161488
   j   -0.318064
dtype: float64

In [3]:
data.loc['b':'d']

b  h    0.227043
   j    0.020601
c  k   -0.158537
   l   -0.413789
d  k   -0.161488
   j   -0.318064
dtype: float64

In [4]:
data.loc['b':'d', 'j']  # 'b':'d'为外层索引，'j'为内层索引

b  j    0.020601
d  j   -0.318064
dtype: float64

In [5]:
# 这种情况外层索引不支持使用冒号全选，因此改用slice(None)来替代冒号进行全选
data.loc[slice(None), ['j', 'k']]

a  j    1.322214
   k    0.117585
b  j    0.020601
c  k   -0.158537
d  k   -0.161488
   j   -0.318064
dtype: float64

In [6]:
data = pd.DataFrame(
    np.random.randint(1, 100, size=25).reshape((5, 5)),
    index=[list('aabbc'), list('xyyzz')],
    columns=list('ABCDE'))
data

Unnamed: 0,Unnamed: 1,A,B,C,D,E
a,x,90,73,80,26,87
a,y,4,98,80,7,47
b,y,58,92,8,60,6
b,z,54,39,91,42,9
c,z,2,43,38,6,61


In [7]:
data.loc['a', 'B':'D']  # 选取外层标签的所有行

Unnamed: 0,B,C,D
x,73,80,26
y,98,80,7


In [8]:
data.loc['a', 'y']  # 选取指定外层和内层标签指定的行

A     4
B    98
C    80
D     7
E    47
Name: (a, y), dtype: int32

In [9]:
data.loc[('a', 'y')]  # 同上，使用元组格式

A     4
B    98
C    80
D     7
E    47
Name: (a, y), dtype: int32

In [10]:
# a-y行到b-z行切片，选取B、C、E列
data.loc[('a', 'y'):('b', 'z'), ['B', 'C', 'E']]

Unnamed: 0,Unnamed: 1,B,C,E
a,y,98,80,47
b,y,92,8,6
b,z,39,91,9


In [11]:
# 选取所有内层标签是y的行，外层标签使用：会出错，因此使用slice(None)
data.loc[(slice(None), 'y'), :]

Unnamed: 0,Unnamed: 1,A,B,C,D,E
a,y,4,98,80,7,47
b,y,58,92,8,60,6


In [12]:
# 设置列名为多层索引，并为每层行和列标签都设置name属性
data.columns = [list('XXYYZ'), list('ACBCD')]
data.columns.names = ['X-Z', 'A-D']
data.index.names = ['a-c', 'x-z']
data

Unnamed: 0_level_0,X-Z,X,X,Y,Y,Z
Unnamed: 0_level_1,A-D,A,C,B,C,D
a-c,x-z,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
a,x,90,73,80,26,87
a,y,4,98,80,7,47
b,y,58,92,8,60,6
b,z,54,39,91,42,9
c,z,2,43,38,6,61


## 2. 重排与分级排序
- `swaplevel()` 接受两个级别编号或名称，并返回一个互换了级别的新对象（数据内容和顺序不变），`axis`指定对换的轴，**注：只互换级别，不改变标签顺序。**
- `sort_index()` 根据指定级别中的值对数据进行排序  
参数 `axis` `level` `ascending` `na_position`


In [13]:
data.swaplevel('a-c','x-z') # 使用name属性互换标签

Unnamed: 0_level_0,X-Z,X,X,Y,Y,Z
Unnamed: 0_level_1,A-D,A,C,B,C,D
x-z,a-c,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
x,a,90,73,80,26,87
y,a,4,98,80,7,47
y,b,58,92,8,60,6
z,b,54,39,91,42,9
z,c,2,43,38,6,61


In [14]:
data.swaplevel(0,1,axis=1) # 指定轴为列，并使用位置序号互换标签，注意标签顺序不改变

Unnamed: 0_level_0,A-D,A,C,B,C,D
Unnamed: 0_level_1,X-Z,X,X,Y,Y,Z
a-c,x-z,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
a,x,90,73,80,26,87
a,y,4,98,80,7,47
b,y,58,92,8,60,6
b,z,54,39,91,42,9
c,z,2,43,38,6,61


In [15]:
# 互换标签后，根据列的0层标签即外层标签进行排序，默认升序
data.swaplevel(0,1,axis=1).sort_index(level=0,axis=1)

Unnamed: 0_level_0,A-D,A,B,C,C,D
Unnamed: 0_level_1,X-Z,X,Y,X,Y,Z
a-c,x-z,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
a,x,90,80,73,26,87
a,y,4,80,98,7,47
b,y,58,8,92,60,6
b,z,54,91,39,42,9
c,z,2,38,43,6,61


## 3. 根据级别汇总统计
很多汇总统计方法都有`level`参数，可以指定对某轴上某层标签进行汇总统计

In [16]:
data

Unnamed: 0_level_0,X-Z,X,X,Y,Y,Z
Unnamed: 0_level_1,A-D,A,C,B,C,D
a-c,x-z,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
a,x,90,73,80,26,87
a,y,4,98,80,7,47
b,y,58,92,8,60,6
b,z,54,39,91,42,9
c,z,2,43,38,6,61


In [17]:
data.sum(level='x-z') # 在行轴上对'x-z'层进行求和

X-Z,X,X,Y,Y,Z
A-D,A,C,B,C,D
x-z,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
x,90,73,80,26,87
y,62,190,88,67,53
z,56,82,129,48,70


In [18]:
data.sum(level=0,axis=1) # 在列轴上对第一层进行求和

Unnamed: 0_level_0,X-Z,X,Y,Z
a-c,x-z,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
a,x,163,106,87
a,y,102,87,47
b,y,150,68,6
b,z,93,133,9
c,z,45,44,61


## 4. 使用DataFrame的列进行索引
- `set_index()` 将指定的列设置为行索引，同时在数据中将该列移除  
参数  
`drop=False` 可以在设置的同时在数据中保留该列
- `reset_index()` 将指定的行索引还原为列，不指定索引的话将所有行索引设置为列

In [19]:
data=pd.DataFrame(np.random.randint(100,size=32).reshape(8,4),columns=list('ABCD'))
data['C']=list('XXYYYZDD')
data['D']=list('abbccddd')
data

Unnamed: 0,A,B,C,D
0,66,20,X,a
1,21,40,X,b
2,63,76,Y,b
3,82,64,Y,c
4,43,87,Y,c
5,27,19,Z,d
6,56,98,D,d
7,56,6,D,d


In [20]:
data.set_index('C') # 将C列设置为index

Unnamed: 0_level_0,A,B,D
C,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
X,66,20,a
X,21,40,b
Y,63,76,b
Y,82,64,c
Y,43,87,c
Z,27,19,d
D,56,98,d
D,56,6,d


In [21]:
data.set_index(['D','C']) # 将多列设置多层index

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
D,C,Unnamed: 2_level_1,Unnamed: 3_level_1
a,X,66,20
b,X,21,40
b,Y,63,76
c,Y,82,64
c,Y,43,87
d,Z,27,19
d,D,56,98
d,D,56,6


In [22]:
data.set_index(['D','C'],drop=False) # 保留列

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
D,C,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
a,X,66,20,X,a
b,X,21,40,X,b
b,Y,63,76,Y,b
c,Y,82,64,Y,c
c,Y,43,87,Y,c
d,Z,27,19,Z,d
d,D,56,98,D,d
d,D,56,6,D,d


In [23]:
data.set_index(['D','C']).reset_index('D') # 将指定行索引还原为列

Unnamed: 0_level_0,D,A,B
C,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
X,a,66,20
X,b,21,40
Y,b,63,76
Y,c,82,64
Y,c,43,87
Z,d,27,19
D,d,56,98
D,d,56,6


In [24]:
data.set_index(['D','C']).reset_index() # 不指定行索引则将所有行索引还原为列

Unnamed: 0,D,C,A,B
0,a,X,66,20
1,b,X,21,40
2,b,Y,63,76
3,c,Y,82,64
4,c,Y,43,87
5,d,Z,27,19
6,d,D,56,98
7,d,D,56,6
