# <center>第三章 索引</center>

In [1]:
import numpy as np
import pandas as pd

## 一、索引器

### 1.表的列索引
`[列名]`取出相应列，返回值为`Series`

In [3]:
df = pd.read_csv('./data/learn_pandas.csv',usecols=['School','Grade','Name','Gender','Weight','Transfer'])
df['Name'].head()

0      Gaopeng Yang
1    Changqiang You
2           Mei Sun
3      Xiaojuan Sun
4       Gaojuan You
Name: Name, dtype: object

`[列名组成的列表]`取多列并返回一个Dataframe

In [5]:
df[['Gender','Name']].head()

Unnamed: 0,Gender,Name
0,Female,Gaopeng Yang
1,Male,Changqiang You
2,Male,Mei Sun
3,Female,Xiaojuan Sun
4,Male,Gaojuan You


In [9]:
df.Name.head()#列名不含空格

0      Gaopeng Yang
1    Changqiang You
2           Mei Sun
3      Xiaojuan Sun
4       Gaojuan You
Name: Name, dtype: object

### 2.序列的行索引
`[Item]`取出索引对应元素，如果只有单个值对应，则返回这个标量值；如果多个值的索引相同，则返回一个Series

多个索引对应的元素，`[items的列表]`

如果想要取出某两个索引之间的元素，并且这两个索引是在整个索引中唯一出现，则可以使用切片,，同时需要注意这里的切片会包含两个端点：


In [21]:
s= pd.Series([1,2,3,4,5,6],index=['a','b','a','a','f','c'])
s['a']

a    1
a    3
a    4
dtype: int64

In [25]:
s['f':'b':-1]

f    5
a    4
a    3
b    2
dtype: int64

不指定索引则为默认生成的整数索引。整数切片同Python切片一个道理，不包含右端点

In [31]:
s = pd.Series([1,5,9,6])
s[0:2]

0    1
1    5
dtype: int64

### 3. loc索引器

前面讲到了对`DataFrame`的列进行选取，下面要讨论其行的选取。对于表而言，有两种索引器，一种是基于**元素**的`loc`索引器，另一种是基于**位置**的`iloc`索引器。

`loc`索引器的一般形式是`loc[*, *]`，其中第一个`*`代表行的选择，第二个`*`代表列的选择，如果省略第二个位置写作`loc[*]`，这个`*`是指行的筛选。其中，`*`的位置一共有五类合法对象，分别是：单个元素、元素列表、元素切片、布尔列表以及函数，下面将依次说明。

为了演示相应操作，先利用`set_index`方法把`Name`列设为索引，关于该函数的其他用法将在多级索引一章介绍。

In [36]:
df_demo = df.set_index('Name')
df_demo.head()

Unnamed: 0_level_0,School,Grade,Gender,Weight,Transfer
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Gaopeng Yang,Shanghai Jiao Tong University,Freshman,Female,46.0,N
Changqiang You,Peking University,Freshman,Male,70.0,N
Mei Sun,Shanghai Jiao Tong University,Senior,Male,89.0,N
Xiaojuan Sun,Fudan University,Sophomore,Female,41.0,N
Gaojuan You,Fudan University,Sophomore,Male,74.0,N


In [37]:
df_demo.loc['Qiang Sun']#多人叫次名字，返回DataFrame

Unnamed: 0_level_0,School,Grade,Gender,Weight,Transfer
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Qiang Sun,Tsinghua University,Junior,Female,53.0,N
Qiang Sun,Tsinghua University,Sophomore,Female,40.0,N
Qiang Sun,Shanghai Jiao Tong University,Junior,Female,,N


In [41]:
df_demo.loc['Quan Zhao']#名字唯一，返回Series

Index(['School', 'Grade', 'Gender', 'Weight', 'Transfer'], dtype='object')

也可以同时选择行和列

In [46]:
df_demo.loc['Qiang Sun','School']#返回Series

Name
Qiang Sun              Tsinghua University
Qiang Sun              Tsinghua University
Qiang Sun    Shanghai Jiao Tong University
Name: School, dtype: object

In [47]:
df_demo.loc['Quan Zhao','School']#返回单个元素

'Shanghai Jiao Tong University'

`*`为元素列表

In [51]:
df_demo.loc[['Qiang Sun','Quan Zhao'],['School','Gender']]

Unnamed: 0_level_0,School,Gender
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Qiang Sun,Tsinghua University,Female
Qiang Sun,Tsinghua University,Female
Qiang Sun,Shanghai Jiao Tong University,Female
Quan Zhao,Shanghai Jiao Tong University,Female


`*`为切片，跟之前提到的切片一个道理，索引唯一且包含右端点

In [53]:
df_demo.loc['Gaojuan You':'Gaoqiang Qian','School':'Gender']

Unnamed: 0_level_0,School,Grade,Gender
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Gaojuan You,Fudan University,Sophomore,Male
Xiaoli Qian,Tsinghua University,Freshman,Female
Qiang Chu,Shanghai Jiao Tong University,Freshman,Female
Gaoqiang Qian,Tsinghua University,Junior,Female


需要注意的是，如果`DataFrame`使用整数索引，其使用整数切片的时候和上面字符串索引的要求一致，都是**元素**切片，包含端点且起点、终点不允许有重复值。

In [68]:
df_loc_slice_demo = df_demo.copy()
df_loc_slice_demo.index = range(df_demo.shape[0],0,-1)
df_loc_slice_demo.loc[5:3]

Unnamed: 0,School,Grade,Gender,Weight,Transfer
5,Fudan University,Junior,Female,46.0,N
4,Tsinghua University,Senior,Female,50.0,N
3,Shanghai Jiao Tong University,Senior,Female,45.0,N


In [78]:
df_demo.loc[df_demo.Weight>70].head()#*为相同长度的布尔类型Series

Unnamed: 0_level_0,School,Grade,Gender,Weight,Transfer
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Mei Sun,Shanghai Jiao Tong University,Senior,Male,89.0,N
Gaojuan You,Fudan University,Sophomore,Male,74.0,N
Xiaopeng Zhou,Shanghai Jiao Tong University,Freshman,Male,74.0,N
Xiaofeng Sun,Tsinghua University,Senior,Male,71.0,N
Qiang Zheng,Shanghai Jiao Tong University,Senior,Male,87.0,N


也可以通过`isin`方法返回布尔Series

In [80]:
df_demo.loc[df_demo.Grade.isin(['Freshman','Senior'])].head()
df_demo.Grade.isin(['Freshman','Senior'])

Name
Gaopeng Yang       True
Changqiang You     True
Mei Sun            True
Xiaojuan Sun      False
Gaojuan You       False
                  ...  
Xiaojuan Sun      False
Li Zhao            True
Chengqiang Chu     True
Chengmei Shen      True
Chunpeng Lv       False
Name: Grade, Length: 200, dtype: bool

对于复合条件而言，可以用`|（或）, &（且）, ~（取反）`的组合来实现，例如选出复旦大学中体重超过70kg的大四学生，或者北大男生中体重超过80kg的非大四的学生

In [81]:
condition_1_1 = df_demo.School == 'Fudan University'
condition_1_2 = df_demo.Grade == 'Senior'
condition_1_3 = df_demo.Weight > 70
condition_1 = condition_1_1 & condition_1_2 & condition_1_3
condition_2_1 = df_demo.School == 'Peking University'
condition_2_2 = df_demo.Grade == 'Senior'
condition_2_3 = df_demo.Weight > 80
condition_2 = condition_2_1 & (~condition_2_2) & condition_2_3
df_demo.loc[condition_1 | condition_2]

Unnamed: 0_level_0,School,Grade,Gender,Weight,Transfer
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Qiang Han,Peking University,Freshman,Male,87.0,N
Chengpeng Zhou,Fudan University,Senior,Male,81.0,N
Changpeng Zhao,Peking University,Freshman,Male,83.0,N
Chengpeng Qian,Fudan University,Senior,Male,73.0,Y


In [94]:
df_demo.select_dtypes('number')
mask=[]
for i in df_demo.columns:
    if df_demo[i].dtype =='float64':
        mask.append(True)
    else:
        mask.append(False)
df_demo.loc[:,mask]


Unnamed: 0_level_0,Weight
Name,Unnamed: 1_level_1
Gaopeng Yang,46.0
Changqiang You,70.0
Mei Sun,89.0
Xiaojuan Sun,41.0
Gaojuan You,74.0
...,...
Xiaojuan Sun,46.0
Li Zhao,50.0
Chengqiang Chu,45.0
Chengmei Shen,71.0


In [96]:
(df_demo.Weight>80).values

array([False, False,  True, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,  True, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False,  True, False, False, False, False,  True,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
        True, False, False, False, False, False, False, False, False,
       False, False,

In [106]:
df_chain = pd.DataFrame([[0,0],[1,0],[-1,0]], columns=list('AB'))
id(df_chain)


1944240861136

In [103]:
w =df_chain!=0
id(w)

1944291846016

In [107]:
q =df_chain[w]
print(id(q))

1944184648592


Unnamed: 0,A,B
0,,
1,1.0,
2,-1.0,
