# pandas的分层索引MultiIndex
为什么要学习分层索引MultiIndex

- 分层索引：在一个轴上拥有多个索引层级，可以表达更高纬度数据的形式
- 可以更方便的进行数据筛选，如果有序则性能更好
- groupby等操作的结果，如果是多key结果是分层索引，需要会使用
- 一般不需要自己创建分层索引（MultiIndex有构造函数但是一般不用）

演示；百度 阿里 爱奇艺 京东四家公司的10天股票数据  
数据来源：英为财经  
https://cn.investing.com/  

本次演示提纲：
1. Series的分层索引MultiIndex
2. Series有多层索引怎么筛选数据
3. DataFrame的多层索引MultiIndex
4. DataFrame有多层索引怎么样筛选数据

In [1]:
import pandas as pd

In [2]:
%matplotlib inline

In [3]:
stocks = pd.read_excel('./data/stocks/互联网公司股票.xlsx')

In [4]:
stocks

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2019-10-03,BIDU,104.32,102.35,104.73,101.15,2.24,0.02
1,2019-10-02,BIDU,102.62,100.85,103.24,99.5,2.69,0.01
2,2019-10-01,BIDU,102.0,102.8,103.26,101.0,1.78,-0.01
3,2019-10-03,BABA,169.48,166.65,170.18,165.0,10.39,0.02
4,2019-10-02,BABA,165.77,162.82,166.88,161.9,11.6,0.0
5,2019-10-01,BABA,165.15,168.01,168.23,163.64,14.19,-0.01
6,2019-10-03,IQ,16.06,15.71,16.38,15.32,10.08,0.02
7,2019-10-02,IQ,15.72,15.85,15.87,15.12,8.1,-0.01
8,2019-10-01,IQ,15.92,16.14,16.22,15.5,11.65,-0.01
9,2019-10-03,JD,28.8,28.11,28.97,27.82,8.77,0.03


In [5]:
stocks.shape

(12, 8)

In [6]:
stocks.head()

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2019-10-03,BIDU,104.32,102.35,104.73,101.15,2.24,0.02
1,2019-10-02,BIDU,102.62,100.85,103.24,99.5,2.69,0.01
2,2019-10-01,BIDU,102.0,102.8,103.26,101.0,1.78,-0.01
3,2019-10-03,BABA,169.48,166.65,170.18,165.0,10.39,0.02
4,2019-10-02,BABA,165.77,162.82,166.88,161.9,11.6,0.0


In [7]:
stocks['公司'].unique()

array(['BIDU', 'BABA', 'IQ', 'JD'], dtype=object)

In [8]:
stocks.index

RangeIndex(start=0, stop=12, step=1)

In [9]:
res = stocks.groupby('公司')['收盘'].mean()
res

公司
BABA    166.80
BIDU    102.98
IQ       15.90
JD       28.35
Name: 收盘, dtype: float64

## Series的分层索引MultiIndex

In [10]:
res.index

Index(['BABA', 'BIDU', 'IQ', 'JD'], dtype='object', name='公司')

In [11]:
ser = stocks.groupby(['公司', '日期'])['收盘'].mean()
ser


公司    日期        
BABA  2019-10-01    165.15
      2019-10-02    165.77
      2019-10-03    169.48
BIDU  2019-10-01    102.00
      2019-10-02    102.62
      2019-10-03    104.32
IQ    2019-10-01     15.92
      2019-10-02     15.72
      2019-10-03     16.06
JD    2019-10-01     28.19
      2019-10-02     28.06
      2019-10-03     28.80
Name: 收盘, dtype: float64

多维索引中 空白的意思是 使用上面的索引

In [12]:
ser.index

MultiIndex([('BABA', '2019-10-01'),
            ('BABA', '2019-10-02'),
            ('BABA', '2019-10-03'),
            ('BIDU', '2019-10-01'),
            ('BIDU', '2019-10-02'),
            ('BIDU', '2019-10-03'),
            (  'IQ', '2019-10-01'),
            (  'IQ', '2019-10-02'),
            (  'IQ', '2019-10-03'),
            (  'JD', '2019-10-01'),
            (  'JD', '2019-10-02'),
            (  'JD', '2019-10-03')],
           names=['公司', '日期'])

In [13]:
# unstack将二级索引变成列
ser.unstack()

日期,2019-10-01,2019-10-02,2019-10-03
公司,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
BABA,165.15,165.77,169.48
BIDU,102.0,102.62,104.32
IQ,15.92,15.72,16.06
JD,28.19,28.06,28.8


In [14]:
ser

公司    日期        
BABA  2019-10-01    165.15
      2019-10-02    165.77
      2019-10-03    169.48
BIDU  2019-10-01    102.00
      2019-10-02    102.62
      2019-10-03    104.32
IQ    2019-10-01     15.92
      2019-10-02     15.72
      2019-10-03     16.06
JD    2019-10-01     28.19
      2019-10-02     28.06
      2019-10-03     28.80
Name: 收盘, dtype: float64

In [15]:
ser.reset_index()

Unnamed: 0,公司,日期,收盘
0,BABA,2019-10-01,165.15
1,BABA,2019-10-02,165.77
2,BABA,2019-10-03,169.48
3,BIDU,2019-10-01,102.0
4,BIDU,2019-10-02,102.62
5,BIDU,2019-10-03,104.32
6,IQ,2019-10-01,15.92
7,IQ,2019-10-02,15.72
8,IQ,2019-10-03,16.06
9,JD,2019-10-01,28.19


## Series有多层索引怎么筛选数据

In [16]:
ser

公司    日期        
BABA  2019-10-01    165.15
      2019-10-02    165.77
      2019-10-03    169.48
BIDU  2019-10-01    102.00
      2019-10-02    102.62
      2019-10-03    104.32
IQ    2019-10-01     15.92
      2019-10-02     15.72
      2019-10-03     16.06
JD    2019-10-01     28.19
      2019-10-02     28.06
      2019-10-03     28.80
Name: 收盘, dtype: float64

In [17]:
ser.loc['BIDU']

日期
2019-10-01    102.00
2019-10-02    102.62
2019-10-03    104.32
Name: 收盘, dtype: float64

In [18]:
ser.loc[('BIDU', '2019-10-01')]

102.0

In [19]:
ser.loc[:,'2019-10-01']

公司
BABA    165.15
BIDU    102.00
IQ       15.92
JD       28.19
Name: 收盘, dtype: float64

## DataFrame的多层索引MultiIndex

In [20]:
stocks.head()

Unnamed: 0,日期,公司,收盘,开盘,高,低,交易量,涨跌幅
0,2019-10-03,BIDU,104.32,102.35,104.73,101.15,2.24,0.02
1,2019-10-02,BIDU,102.62,100.85,103.24,99.5,2.69,0.01
2,2019-10-01,BIDU,102.0,102.8,103.26,101.0,1.78,-0.01
3,2019-10-03,BABA,169.48,166.65,170.18,165.0,10.39,0.02
4,2019-10-02,BABA,165.77,162.82,166.88,161.9,11.6,0.0


In [21]:
stocks.set_index(['公司', '日期'], inplace=True)

In [22]:
stocks.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
BIDU,2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
BIDU,2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01
BABA,2019-10-03,169.48,166.65,170.18,165.0,10.39,0.02
BABA,2019-10-02,165.77,162.82,166.88,161.9,11.6,0.0


In [24]:
stocks.index

MultiIndex([('BIDU', '2019-10-03'),
            ('BIDU', '2019-10-02'),
            ('BIDU', '2019-10-01'),
            ('BABA', '2019-10-03'),
            ('BABA', '2019-10-02'),
            ('BABA', '2019-10-01'),
            (  'IQ', '2019-10-03'),
            (  'IQ', '2019-10-02'),
            (  'IQ', '2019-10-01'),
            (  'JD', '2019-10-03'),
            (  'JD', '2019-10-02'),
            (  'JD', '2019-10-01')],
           names=['公司', '日期'])

In [25]:
stocks.sort_index(inplace=True)

In [27]:
stocks

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BABA,2019-10-01,165.15,168.01,168.23,163.64,14.19,-0.01
BABA,2019-10-02,165.77,162.82,166.88,161.9,11.6,0.0
BABA,2019-10-03,169.48,166.65,170.18,165.0,10.39,0.02
BIDU,2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01
BIDU,2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
IQ,2019-10-01,15.92,16.14,16.22,15.5,11.65,-0.01
IQ,2019-10-02,15.72,15.85,15.87,15.12,8.1,-0.01
IQ,2019-10-03,16.06,15.71,16.38,15.32,10.08,0.02
JD,2019-10-01,28.19,28.22,28.57,27.97,10.64,0.0


## DataFrame有多层索引MultiIndex怎样筛选数据

【重要知识】
- 元组（key1，key2）代表筛选多层索引，其中key1是索引第一级
- 列表[key1，key2]代表同一层的多个key，其中key1和key2是同级索引

In [23]:
stocks.loc['BIDU']

Unnamed: 0_level_0,收盘,开盘,高,低,交易量,涨跌幅
日期,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01


In [28]:
stocks.loc[('BIDU', '2019-10-02')]

收盘     102.62
开盘     100.85
高      103.24
低       99.50
交易量      2.69
涨跌幅      0.01
Name: (BIDU, 2019-10-02), dtype: float64

In [29]:
stocks.loc[('BIDU', '2019-10-02'), :]

收盘     102.62
开盘     100.85
高      103.24
低       99.50
交易量      2.69
涨跌幅      0.01
Name: (BIDU, 2019-10-02), dtype: float64

In [30]:
stocks.loc[('BIDU', '2019-10-02'), '开盘']

100.85

In [31]:
stocks.loc[['BIDU', 'JD'], :]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01
BIDU,2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
JD,2019-10-01,28.19,28.22,28.57,27.97,10.64,0.0
JD,2019-10-02,28.06,28.0,28.22,27.53,9.53,0.0
JD,2019-10-03,28.8,28.11,28.97,27.82,8.77,0.03


In [32]:
stocks.loc[(['BIDU', 'JD'], '2019-10-03'), :]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
JD,2019-10-03,28.8,28.11,28.97,27.82,8.77,0.03


In [33]:
stocks.loc[(['BIDU', 'JD'], '2019-10-03'), '收盘']

公司    日期        
BIDU  2019-10-03    104.32
JD    2019-10-03     28.80
Name: 收盘, dtype: float64

In [34]:
stocks.loc[('BIDU', ['2019-10-03', '2019-10-02']), '收盘']

公司    日期        
BIDU  2019-10-03    104.32
      2019-10-02    102.62
Name: 收盘, dtype: float64

In [36]:
# slice(None)代表筛选这一索引的所有内容

stocks.loc[(slice(None), ['2019-10-02', '2019-10-03']), :]

Unnamed: 0_level_0,Unnamed: 1_level_0,收盘,开盘,高,低,交易量,涨跌幅
公司,日期,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
BABA,2019-10-02,165.77,162.82,166.88,161.9,11.6,0.0
BABA,2019-10-03,169.48,166.65,170.18,165.0,10.39,0.02
BIDU,2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
IQ,2019-10-02,15.72,15.85,15.87,15.12,8.1,-0.01
IQ,2019-10-03,16.06,15.71,16.38,15.32,10.08,0.02
JD,2019-10-02,28.06,28.0,28.22,27.53,9.53,0.0
JD,2019-10-03,28.8,28.11,28.97,27.82,8.77,0.03


In [38]:
# # 下面这个会报错
# stocks.loc[(:, ['2019-10-02', '2019-10-03']), :]

SyntaxError: invalid syntax (2048477888.py, line 1)

In [37]:
stocks.reset_index()

Unnamed: 0,公司,日期,收盘,开盘,高,低,交易量,涨跌幅
0,BABA,2019-10-01,165.15,168.01,168.23,163.64,14.19,-0.01
1,BABA,2019-10-02,165.77,162.82,166.88,161.9,11.6,0.0
2,BABA,2019-10-03,169.48,166.65,170.18,165.0,10.39,0.02
3,BIDU,2019-10-01,102.0,102.8,103.26,101.0,1.78,-0.01
4,BIDU,2019-10-02,102.62,100.85,103.24,99.5,2.69,0.01
5,BIDU,2019-10-03,104.32,102.35,104.73,101.15,2.24,0.02
6,IQ,2019-10-01,15.92,16.14,16.22,15.5,11.65,-0.01
7,IQ,2019-10-02,15.72,15.85,15.87,15.12,8.1,-0.01
8,IQ,2019-10-03,16.06,15.71,16.38,15.32,10.08,0.02
9,JD,2019-10-01,28.19,28.22,28.57,27.97,10.64,0.0
