## 3.1 索引概述

### 3.1.2 两种类型的索引

- 第一种是基于位置（整数）的索引，通过指定我们要选哪几行和哪几列，来筛选出目标数据。
- 第二种是基于名称（标签）的索引，既可以指定列具体的名称，又可以加上复杂的条件判断，筛选更加灵活。

## 3.2 基于位置（数字）的索引

In [1]:
import pandas as pd
from common_util.openDataDir import getFullPath

In [3]:
filePath = getFullPath('第3章 玩转索引', '流量练习数据.xls')
#读取案例数据集
df = pd.read_excel(filePath)
df['支付转化率'] = df['支付转化率'].str.replace('%', '').astype(float)
df['支付转化率'] = df['支付转化率'] / 100
df.head()

Unnamed: 0,流量来源,来源明细,访客数,支付转化率,客单价
0,一级,-A,35188,0.0998,54.3
1,一级,-B,28467,0.1127,99.93
2,一级,-C,13747,0.0254,0.08
3,一级,-D,5183,0.0247,37.15
4,一级,-E,4361,0.0431,91.73


### 3.2.1 场景一：行选取
df. iloc［ 行参数, 列参数］

In [3]:
df.iloc[0:13, :]

Unnamed: 0,流量来源,来源明细,访客数,支付转化率,客单价
0,一级,-A,35188,0.0998,54.3
1,一级,-B,28467,0.1127,99.93
2,一级,-C,13747,0.0254,0.08
3,一级,-D,5183,0.0247,37.15
4,一级,-E,4361,0.0431,91.73
5,一级,-F,4063,0.1157,65.09
6,一级,-G,2122,0.1027,86.45
7,一级,-H,2041,0.0706,44.07
8,一级,-I,1991,0.1652,104.57
9,一级,-J,1981,0.0575,75.93


### 3.2.2  场景二：列选取

In [4]:
df.iloc[:, [0, 4]]

Unnamed: 0,流量来源,客单价
0,一级,54.3
1,一级,99.93
2,一级,0.08
3,一级,37.15
4,一级,91.73
5,一级,65.09
6,一级,86.45
7,一级,44.07
8,一级,104.57
9,一级,75.93


In [5]:
# 交叉行列选取
df.iloc[13:18, 0:4]

Unnamed: 0,流量来源,来源明细,访客数,支付转化率
13,二级,-A,39048,0.116
14,二级,-B,3316,0.0709
15,二级,-C,2043,0.0504
16,三级,-A,23140,0.0969
17,三级,-B,14813,0.2014


## 3.3 基于名称（标签）的索引

### 3.3.1 基于loc的行选取

In [6]:
df.loc[df['流量来源'] == '一级', :]

Unnamed: 0,流量来源,来源明细,访客数,支付转化率,客单价
0,一级,-A,35188,0.0998,54.3
1,一级,-B,28467,0.1127,99.93
2,一级,-C,13747,0.0254,0.08
3,一级,-D,5183,0.0247,37.15
4,一级,-E,4361,0.0431,91.73
5,一级,-F,4063,0.1157,65.09
6,一级,-G,2122,0.1027,86.45
7,一级,-H,2041,0.0706,44.07
8,一级,-I,1991,0.1652,104.57
9,一级,-J,1981,0.0575,75.93


### 3.3.2 基于loc的列选取

In [7]:
df.loc[:, ['流量来源', '客单价']]

Unnamed: 0,流量来源,客单价
0,一级,54.3
1,一级,99.93
2,一级,0.08
3,一级,37.15
4,一级,91.73
5,一级,65.09
6,一级,86.45
7,一级,44.07
8,一级,104.57
9,一级,75.93


### 3.3.3 基于loc的交叉选取

In [8]:
# isin() 方法能够帮助我们快速判断源数据中某一列的值是否等于列表中的值。
df.loc[df['流量来源'].isin(['二级', '三级']), ['流量来源', '来源明细', '访客数', '支付转化率']]

Unnamed: 0,流量来源,来源明细,访客数,支付转化率
13,二级,-A,39048,0.116
14,二级,-B,3316,0.0709
15,二级,-C,2043,0.0504
16,三级,-A,23140,0.0969
17,三级,-B,14813,0.2014


### 3.3.4 场景四：多条件索引

In [9]:
print('访客数均值：', df['访客数'].mean())
print('转化率均值：', df['支付转化率'].mean())
print('客单价均值：', df['客单价'].mean())

访客数均值： 8498.0
转化率均值： 0.07547727272727273
客单价均值： 72.86


In [10]:
#判断访客数大于均值
df['访客数'] > df['访客数'].mean()

0      True
1      True
2      True
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13     True
14    False
15    False
16     True
17     True
18    False
19    False
20    False
21    False
Name: 访客数, dtype: bool

In [11]:
# 多条件满足
(df['访客数'] > df['访客数'].mean()) & (df['支付转化率'] > df['支付转化率'].mean()) & (
        df['客单价'] > df['客单价'].mean())

0     False
1      True
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13     True
14    False
15    False
16     True
17     True
18    False
19    False
20    False
21    False
dtype: bool

In [12]:
df.loc[(df['访客数'] > df['访客数'].mean()) &
       (df['支付转化率'] > df['支付转化率'].mean()) &
       (df['客单价'] > df['客单价'].mean()), :]

Unnamed: 0,流量来源,来源明细,访客数,支付转化率,客单价
1,一级,-B,28467,0.1127,99.93
13,二级,-A,39048,0.116,91.91
16,三级,-A,23140,0.0969,83.75
17,三级,-B,14813,0.2014,82.97
