# Chapter 4: Selecting subsets of data

## Recipes
* [4.1 Selecting Series data](#4.1-Selecting-Series-data)
* [4.2 Selecting DataFrame rows](#4.2-Selecting-DataFrame-rows)
* [4.3 Selecting DataFrame rows and columns simultaneously](#4.3-Selecting-DataFrame-rows-and-columns-simultaneously)
* [4.4 Selecting with a combination of integers and labels](#4.4-Selecting-with-a-combination-of-integers-and-labels)
* [4.5 Speeding up scalar selection](#4.5-Speeding-up-scalar-selection)
* [4.6 Slicing rows lazily](#4.6-Slicing-rows-lazily)
* [4.7 Slicing Lexicographically](#4.7-Slicing-Lexicographically)


In [1]:
import pandas as pd
import numpy as np

# 4.1 Selecting Series data

In [2]:
### [Tech] Series에 indexer의 활용.  .loc[ ], .iloc[ ] , fancy indexing 
### [Goal] college.CITY Series 정수/레이블로 한 개 / 여러 개 / 범위지정하여 데이터 읽기

## >> How it works...

In [3]:
# 4.1.1 index INSTNM 이 있는 Series city 만들기
college = pd.read_csv('data/college.csv', index_col='INSTNM')
city = college['CITY']
city.head()

INSTNM
Alabama A & M University                   Normal
University of Alabama at Birmingham    Birmingham
Amridge University                     Montgomery
University of Alabama in Huntsville    Huntsville
Alabama State University               Montgomery
Name: CITY, dtype: object

In [4]:
# 4.1.2 .iloc[] 인덱서 indexer : RangeIndex - 정수형 인덱스 번호의 선택 
city.iloc[3]

'Huntsville'

In [5]:
# 4.1.3 fancy indexing with iloc[]: 복수 개 인덱스 번호 지정  -> Series의 반환 
city.iloc[[10,20,30]]

INSTNM
Birmingham Southern College                            Birmingham
George C Wallace State Community College-Hanceville    Hanceville
Judson College                                             Marion
Name: CITY, dtype: object

In [6]:
# 4.1.4 Slicer with iloc[] : 슬라이스 문법으로도 추출 가능 -> Series의 반환
city.iloc[4:50:10]

INSTNM
Alabama State University              Montgomery
Enterprise State Community College    Enterprise
Heritage Christian University           Florence
Marion Military Institute                 Marion
Reid State Technical College           Evergreen
Name: CITY, dtype: object

In [7]:
city.iloc[::10]

INSTNM
Alabama A & M University                                   Normal
Birmingham Southern College                            Birmingham
George C Wallace State Community College-Hanceville    Hanceville
Judson College                                             Marion
Northeast Alabama Community College                    Rainsville
                                                          ...    
Strayer University-Brickell                                 Miami
Strayer University-North Raleigh Campus                   Raleigh
Strayer University-Cobb Campus                            Atlanta
Strayer University-Irving                                  Irving
SAE Institute of Technology  San Francisco             Emeryville
Name: CITY, Length: 754, dtype: object

In [8]:
# 4.1.5 .loc 인덱서 indexer : 현재 지정된 인덱스 레이블(=값)에 의한 참조
city.loc['Heritage Christian University']

'Florence'

In [9]:
# 4.1.6 fancy indexing with loc[] : 복수 개의 레이블 인덱스 지정 -> Series 반환 

# index label 4 개 임의추출
np.random.seed(1)
labels = list(np.random.choice(city.index, 4))
labels

['Northwest HVAC/R Training Center',
 'California State University-Dominguez Hills',
 'Lower Columbia College',
 'Southwest Acupuncture College-Boulder']

In [10]:
city.loc[labels]  # labels는 4개 index label list

INSTNM
Northwest HVAC/R Training Center                Spokane
California State University-Dominguez Hills      Carson
Lower Columbia College                         Longview
Southwest Acupuncture College-Boulder           Boulder
Name: CITY, dtype: object

In [11]:
# 4.1.7 Slicer with loc[] : 슬라이스 문법으로도 추출 -> Series의 반환
# 시작과 끝 모두 label이여야 함. 간격 step은 정수
city.loc['Alabama State University':'Reid State Technical College':10]

INSTNM
Alabama State University              Montgomery
Enterprise State Community College    Enterprise
Heritage Christian University           Florence
Marion Military Institute                 Marion
Reid State Technical College           Evergreen
Name: CITY, dtype: object

## >> There's more... 4.1

In [12]:
# 하나의 값 항목 선택이지만 Series를 반환 받으려면, fancy indexing 형태로 
# 순차형 자료 구조를 전달 
city.iloc[[3]]

INSTNM
University of Alabama in Huntsville    Huntsville
Name: CITY, dtype: object

In [13]:
# Slice 는 [start : end : step] 으로 정의 된다. 
# start와 end는 step에 의해 방향이 정의 된다. 
# 방향과 start / end가 잘못 지정된면 빈 Series가 반환된다.

city.loc['Reid State Technical College':'Alabama State University':10]

Series([], Name: CITY, dtype: object)

In [14]:
city.loc['Reid State Technical College':'Alabama State University':-10]

INSTNM
Reid State Technical College           Evergreen
Marion Military Institute                 Marion
Heritage Christian University           Florence
Enterprise State Community College    Enterprise
Alabama State University              Montgomery
Name: CITY, dtype: object

# 4.2 Selecting DataFrame rows

In [15]:
### [Tech] DataFrame에 indexer의 활용.  .loc[ ], .iloc[ ] , fancy indexing 
### [Goal] college DataFrame정수/레이블로 한 개 / 여러 개 /  범위지정하여 데이터 읽기(DF)

## >> How it works...

In [16]:
# 4.2.1 college, INSTNM을 인덱스 레이블로 지정하여 DataFrame 생성
college = pd.read_csv('data/college.csv', index_col='INSTNM')
college.head()

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,...,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,...,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,...,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,...,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,...,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5


In [17]:
# 4.2.2 .iloc[] 인덱서 indexer에  인덱스 번호 정수값 (scala) 를 전달하면 해당 행row을 
# Series로 반환한다. (column명이 index가 된다. )
# DataFrame에서의 index는 Series의 Name이 되어 있다. 
college.iloc[60]

CITY                  Anchorage
STABBR                       AK
HBCU                          0
MENONLY                       0
WOMENONLY                     0
RELAFFIL                      0
SATVRMID                    NaN
SATMTMID                    NaN
DISTANCEONLY                  0
UGDS                      12865
UGDS_WHITE               0.5747
UGDS_BLACK               0.0358
UGDS_HISP                0.0761
UGDS_ASIAN               0.0778
UGDS_AIAN                0.0653
UGDS_NHPI                0.0086
UGDS_2MOR                 0.098
UGDS_NRA                 0.0181
UGDS_UNKN                0.0457
PPTUG_EF                 0.4539
CURROPER                      1
PCTPELL                  0.2385
PCTFLOAN                 0.2647
UG25ABV                  0.4386
MD_EARN_WNE_P10           42500
GRAD_DEBT_MDN_SUPP      19449.5
Name: University of Alaska Anchorage, dtype: object

In [18]:
# 4.2.3 .loc[] 인덱서 indexer에 인덱스 레이블 한 개를 전달하면 
# 동일한 요령으로 해당 행row을 Series로 반환한다. (column명이 index가 된다. )
college.loc['University of Alaska Anchorage']

CITY                  Anchorage
STABBR                       AK
HBCU                          0
MENONLY                       0
WOMENONLY                     0
RELAFFIL                      0
SATVRMID                    NaN
SATMTMID                    NaN
DISTANCEONLY                  0
UGDS                      12865
UGDS_WHITE               0.5747
UGDS_BLACK               0.0358
UGDS_HISP                0.0761
UGDS_ASIAN               0.0778
UGDS_AIAN                0.0653
UGDS_NHPI                0.0086
UGDS_2MOR                 0.098
UGDS_NRA                 0.0181
UGDS_UNKN                0.0457
PPTUG_EF                 0.4539
CURROPER                      1
PCTPELL                  0.2385
PCTFLOAN                 0.2647
UG25ABV                  0.4386
MD_EARN_WNE_P10           42500
GRAD_DEBT_MDN_SUPP      19449.5
Name: University of Alaska Anchorage, dtype: object

In [19]:
# 4.2.4 fancy indexing with iloc on DataFrame
#   추출코자 하는 인덱스 번호 를 목록으로 전달하면 해당 행만 갖는 DataFrame 생성
college.iloc[[60, 99, 3]]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
University of Alaska Anchorage,Anchorage,AK,0.0,0.0,0.0,0,,,0.0,12865.0,...,0.098,0.0181,0.0457,0.4539,1,0.2385,0.2647,0.4386,42500,19449.5
International Academy of Hair Design,Tempe,AZ,0.0,0.0,0.0,0,,,0.0,188.0,...,0.016,0.0,0.0638,0.0,0,0.7185,0.7346,0.3905,22200,10556.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,...,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0


In [20]:
# 4.2.5 fancy indexing with loc on DataFrame
#   추출코자 하는 인덱스 레이블을 목록으로 전달하면 해당 행만 갖는 DataFrame 생성
labels = ['University of Alaska Anchorage',
          'International Academy of Hair Design',
          'University of Alabama in Huntsville']
college.loc[labels]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
University of Alaska Anchorage,Anchorage,AK,0.0,0.0,0.0,0,,,0.0,12865.0,...,0.098,0.0181,0.0457,0.4539,1,0.2385,0.2647,0.4386,42500,19449.5
International Academy of Hair Design,Tempe,AZ,0.0,0.0,0.0,0,,,0.0,188.0,...,0.016,0.0,0.0638,0.0,0,0.7185,0.7346,0.3905,22200,10556.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,...,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0


In [21]:
# 4.2.6 Slicer with iloc on DataFrame
#     특정 범위만 잘라내려면 .iloc 인덱서에 슬라이서 형태로  
#     인덱스 번호를 이용하여 범위 지정한다
college.iloc[99:102]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
International Academy of Hair Design,Tempe,AZ,0.0,0.0,0.0,0,,,0.0,188.0,...,0.016,0.0,0.0638,0.0,0,0.7185,0.7346,0.3905,22200,10556
GateWay Community College,Phoenix,AZ,0.0,0.0,0.0,0,,,0.0,5211.0,...,0.0127,0.0161,0.0702,0.7465,1,0.327,0.2189,0.5832,29800,7283
Mesa Community College,Mesa,AZ,0.0,0.0,0.0,0,,,0.0,19055.0,...,0.0205,0.0257,0.0682,0.6457,1,0.3423,0.2207,0.401,35200,8000


In [22]:
# 4.2.7 Slicer with loc on DataFrame
#      동일한 요령으로 특정 범위를 인덱스 레이블로 참조하려면 .loc를 이용해서
#     슬라이서 형태로  범위 지정한다
start = 'International Academy of Hair Design'
stop = 'Mesa Community College'
college.loc[start:stop]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
International Academy of Hair Design,Tempe,AZ,0.0,0.0,0.0,0,,,0.0,188.0,...,0.016,0.0,0.0638,0.0,0,0.7185,0.7346,0.3905,22200,10556
GateWay Community College,Phoenix,AZ,0.0,0.0,0.0,0,,,0.0,5211.0,...,0.0127,0.0161,0.0702,0.7465,1,0.327,0.2189,0.5832,29800,7283
Mesa Community College,Mesa,AZ,0.0,0.0,0.0,0,,,0.0,19055.0,...,0.0205,0.0257,0.0682,0.6457,1,0.3423,0.2207,0.401,35200,8000


## >> There's more... 4.2

In [23]:
# 위 예제에서 4.2.4 에서 추출한 행 row 들의  인덱스 레이블 목록을 얻고자 한다면
# 아래와 같이 tolist(), to_list()를 사용하면 된다. 
college.iloc[[60, 99, 3]].index.tolist()

['University of Alaska Anchorage',
 'International Academy of Hair Design',
 'University of Alabama in Huntsville']

# 4.3 Selecting DataFrame rows and columns simultaneously

In [24]:
### [Tech] df.iloc [rows, columns],  df.loc[rows, columns]
### [Goal] college DataFrame에서 행/열 범위 지정하여 부분집합 구하기

## >> How it works...

In [25]:
# 4.3.1 DataFrame (2차원) 에서는 .iloc [ rows, cols] 의 문법으로 부분집합 선택이 가능하다.
#  rows, cols 자리에는 인덱스 번호 한 개, 슬라이스, 목록 (fancy) , boolean filter가 있다.
# 우선 iloc [ row slicer , column slicer]에 사례를 살펴본다. 인덱스 번호로 지정한다. 
# 앞 에서 3개 행, 좌측으로 부터 4개 컬럼이 선택되었다. 

college = pd.read_csv('data/college.csv', index_col='INSTNM')
college.iloc[:3, :4]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alabama A & M University,Normal,AL,1.0,0.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0
Amridge University,Montgomery,AL,0.0,0.0


In [26]:
#    .loc [ row slicer , column slicer]에 사례를 살펴본다. 인덱스 레이블로 지정한다. 
#    단, 인덱스 레이블로 지정할 때에는 end 값에 전달된 레이블 값도 포함된다. 
#    이는 iloc 인덱스 번호로 전달 되었을 때와는 다르게 작동하는 것이다. 
college.loc[:'Amridge University', :'MENONLY']

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alabama A & M University,Normal,AL,1.0,0.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0
Amridge University,Montgomery,AL,0.0,0.0


In [27]:
# 4.3.2 인덱스 번호를 활용해서 모든 행에 대해서 특정 컬럼만 선택하고 싶으면
#    rows 위치에는 모두임을 지칭하는 슬라이스 기호 ':'를
#    columns 위치에는 추출하고자 하는 컬럼의 인덱스 번호를 목록으로 전달한다.(fancy)
college.iloc[:, [4,6]].head()

Unnamed: 0_level_0,WOMENONLY,SATVRMID
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1
Alabama A & M University,0.0,424.0
University of Alabama at Birmingham,0.0,570.0
Amridge University,0.0,
University of Alabama in Huntsville,0.0,595.0
Alabama State University,0.0,425.0


In [28]:
#   .loc를 사용해서 동일하게 구현 할 수 있다. 이 때는 인덱스 레이블을 전달 한다. 

college.loc[:, ['WOMENONLY', 'SATVRMID']]

Unnamed: 0_level_0,WOMENONLY,SATVRMID
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1
Alabama A & M University,0.0,424.0
University of Alabama at Birmingham,0.0,570.0
Amridge University,0.0,
University of Alabama in Huntsville,0.0,595.0
Alabama State University,0.0,425.0
...,...,...
SAE Institute of Technology San Francisco,,
Rasmussen College - Overland Park,,
National Personal Training Institute of Cleveland,,
Bay Area Medical Academy - San Jose Satellite Location,,


In [29]:
# 4.3.3 필요한 행과 열만 명시적으로 지정해서 추출 할 수 있다. fancy indexing
#      iloc 인덱서는 인덱스 번호를 , loc인덱서는 인덱스 레이블을 전달 한다. 
college.iloc[[100, 200], [7, 15]]

Unnamed: 0_level_0,SATMTMID,UGDS_NHPI
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1
GateWay Community College,,0.0029
American Baptist Seminary of the West,,


In [30]:
rows = ['GateWay Community College', 'American Baptist Seminary of the West']
columns = ['SATMTMID', 'UGDS_NHPI']
college.loc[rows, columns]

Unnamed: 0_level_0,SATMTMID,UGDS_NHPI
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1
GateWay Community College,,0.0029
American Baptist Seminary of the West,,


In [31]:
# 4.3.4 단일 항목을 추출 하고자 한다면, 하나의 row, 하나의 column 값을 넘기면 된다. 
college.iloc[5, -4]

0.401

In [32]:
college.loc['The University of Alabama', 'PCTFLOAN']

0.401

In [33]:
# 4.3.5 행 방향으로 슬라이스 하고, 컬럼으로는 스칼라 값으로 인덱싱을 하면 
#  해당 컬럼의 값을 값으로 하는 Series가 생성된다.   (단일열의 선택)
college.iloc[90:80:-2, 5]

INSTNM
Empire Beauty School-Flagstaff     0
Charles of Italy Beauty College    0
Central Arizona College            0
University of Arizona              0
Arizona State University-Tempe     0
Name: RELAFFIL, dtype: int64

In [34]:
start = 'Empire Beauty School-Flagstaff'
stop = 'Arizona State University-Tempe'
college.loc[start:stop:-2, 'RELAFFIL']

INSTNM
Empire Beauty School-Flagstaff     0
Charles of Italy Beauty College    0
Central Arizona College            0
University of Arizona              0
Arizona State University-Tempe     0
Name: RELAFFIL, dtype: int64

## >> There's more... 4.3

In [35]:
# 모든 컬럼을 선택 할 때에는 두 번째 columns 에 ':' 해당하는 부분을 생략할 수 있다. 
# 아래 2 개는 동일하다.  (cf 4.2 : DataFrame의 행 선택하기)
college.iloc[:10]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,...,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,...,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,...,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,...,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,...,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5
The University of Alabama,Tuscaloosa,AL,0.0,0.0,0.0,0,555.0,565.0,0.0,29851.0,...,0.0261,0.0268,0.0026,0.0844,1,0.204,0.401,0.0853,41900,23750.0
Central Alabama Community College,Alexander City,AL,0.0,0.0,0.0,0,,,0.0,1592.0,...,0.0,0.0,0.0019,0.3882,1,0.5892,0.3977,0.3153,27500,16127.0
Athens State University,Athens,AL,0.0,0.0,0.0,0,,,0.0,2991.0,...,0.0174,0.0057,0.0334,0.5517,1,0.4088,0.6296,0.641,39000,18595.0
Auburn University at Montgomery,Montgomery,AL,0.0,0.0,0.0,0,486.0,509.0,0.0,4304.0,...,0.0297,0.0397,0.0246,0.2853,1,0.4192,0.5803,0.293,35000,21335.0
Auburn University,Auburn,AL,0.0,0.0,0.0,0,575.0,588.0,0.0,20514.0,...,0.0,0.01,0.014,0.0862,1,0.161,0.3494,0.0415,45700,21831.0


In [36]:
college.iloc[:10, :]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Alabama A & M University,Normal,AL,1.0,0.0,0.0,0,424.0,420.0,0.0,4206.0,...,0.0,0.0059,0.0138,0.0656,1,0.7356,0.8284,0.1049,30300,33888.0
University of Alabama at Birmingham,Birmingham,AL,0.0,0.0,0.0,0,570.0,565.0,0.0,11383.0,...,0.0368,0.0179,0.01,0.2607,1,0.346,0.5214,0.2422,39700,21941.5
Amridge University,Montgomery,AL,0.0,0.0,0.0,1,,,1.0,291.0,...,0.0,0.0,0.2715,0.4536,1,0.6801,0.7795,0.854,40100,23370.0
University of Alabama in Huntsville,Huntsville,AL,0.0,0.0,0.0,0,595.0,590.0,0.0,5451.0,...,0.0172,0.0332,0.035,0.2146,1,0.3072,0.4596,0.264,45500,24097.0
Alabama State University,Montgomery,AL,1.0,0.0,0.0,0,425.0,430.0,0.0,4811.0,...,0.0098,0.0243,0.0137,0.0892,1,0.7347,0.7554,0.127,26600,33118.5
The University of Alabama,Tuscaloosa,AL,0.0,0.0,0.0,0,555.0,565.0,0.0,29851.0,...,0.0261,0.0268,0.0026,0.0844,1,0.204,0.401,0.0853,41900,23750.0
Central Alabama Community College,Alexander City,AL,0.0,0.0,0.0,0,,,0.0,1592.0,...,0.0,0.0,0.0019,0.3882,1,0.5892,0.3977,0.3153,27500,16127.0
Athens State University,Athens,AL,0.0,0.0,0.0,0,,,0.0,2991.0,...,0.0174,0.0057,0.0334,0.5517,1,0.4088,0.6296,0.641,39000,18595.0
Auburn University at Montgomery,Montgomery,AL,0.0,0.0,0.0,0,486.0,509.0,0.0,4304.0,...,0.0297,0.0397,0.0246,0.2853,1,0.4192,0.5803,0.293,35000,21335.0
Auburn University,Auburn,AL,0.0,0.0,0.0,0,575.0,588.0,0.0,20514.0,...,0.0,0.01,0.014,0.0862,1,0.161,0.3494,0.0415,45700,21831.0


# 4.4 Selecting with a combination of integers and labels

In [37]:
### [Tech] 레이블을 인덱스 번호로 바꾸어 적용 또는 번호를 레이블로 바꾸어 적용
###      하위 버전에서 사용 되던 .ix 가 더 이상 사용 가능하지 않다 (deprecated)
###      .loc 또는 .iloc 를 선택하고, 인덱스 번호 또는 레이블로 통일한다.  
### [Goal] college에서 정수형과 레이블형의 혼합 사용 사례를 예시한다. 

## >> How it works...

In [38]:
# 4.4.1 INSTNM을 인덱스로 하는 college를 생성한다. 
college = pd.read_csv('data/college.csv', index_col='INSTNM')

In [39]:
# 4.4.2 .get_loc(레이블)  을 이용해서 인덱스 번호를 구하고 이를 iloc에 적용
# 이 때 end 에는 +1 을 해 주어야 한다. 
col_start = college.columns.get_loc('UGDS_WHITE')
col_end = college.columns.get_loc('UGDS_UNKN') + 1
col_start, col_end

(10, 19)

In [40]:
# 4.4.3 컬럼 명 ( = 컬럼 인덱스의 레이블 ) 을 컬럼 인덱스 번호로 변환 한 값으로 
#  .iloc 내 슬라이스 추출을 한다. 
college.iloc[:5, col_start:col_end]

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alabama A & M University,0.0333,0.9353,0.0055,0.0019,0.0024,0.0019,0.0,0.0059,0.0138
University of Alabama at Birmingham,0.5922,0.26,0.0283,0.0518,0.0022,0.0007,0.0368,0.0179,0.01
Amridge University,0.299,0.4192,0.0069,0.0034,0.0,0.0,0.0,0.0,0.2715
University of Alabama in Huntsville,0.6988,0.1255,0.0382,0.0376,0.0143,0.0002,0.0172,0.0332,0.035
Alabama State University,0.0158,0.9208,0.0121,0.0019,0.001,0.0006,0.0098,0.0243,0.0137


## >> There's more... 4.4

In [41]:
# 위 예제와 반대로 인덱스 번호로 부터 레이블을 구한 다음, .loc 와 함께 사용 할 수 있다.
# 이 때는 .index (행 인덱스 ), .columns (컬럼 인덱스) 에 [] (인덱스 연산자) 내에 
# 인덱스 번호를 전달하여 해당 위치에 인덱스 레이블을 구하고 이를 전달한다. 
row_start = college.index[10]
row_end = college.index[15]
college.loc[row_start:row_end, 'UGDS_WHITE':'UGDS_UNKN']

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Birmingham Southern College,0.7983,0.1102,0.0195,0.0517,0.0102,0.0,0.0051,0.0,0.0051
Chattahoochee Valley Community College,0.4661,0.4372,0.0492,0.0127,0.0023,0.0035,0.0151,0.0,0.0139
Concordia College Alabama,0.028,0.8758,0.0373,0.0093,0.0,0.0,0.0031,0.0466,0.0
South University-Montgomery,0.3046,0.6054,0.0153,0.0153,0.0153,0.0096,0.0,0.0019,0.0326
Enterprise State Community College,0.6408,0.2435,0.0509,0.0202,0.0081,0.0029,0.0254,0.0012,0.0069
James H Faulkner State Community College,0.6979,0.2259,0.032,0.0084,0.0177,0.0014,0.0152,0.0007,0.0009


In [42]:
# 모든 것이 귀찮으면 단계적으로 나누어서 접근 하는 방법도 있다. 
# 즉 행 인덱스를 .iloc를 이용 인덱스 번호로 슬라이스 하고,
# 이를 다시 .loc 를 이용하여 인덱스 레이블로 슬라이스 한다.  

row10_15 = college.index[10:16]  # 인덱스 번호

In [43]:
college.loc[row10_15, 'UGDS_WHITE':'UGDS_UNKN'] # 인덱스 레이블

Unnamed: 0_level_0,UGDS_WHITE,UGDS_BLACK,UGDS_HISP,UGDS_ASIAN,UGDS_AIAN,UGDS_NHPI,UGDS_2MOR,UGDS_NRA,UGDS_UNKN
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Birmingham Southern College,0.7983,0.1102,0.0195,0.0517,0.0102,0.0,0.0051,0.0,0.0051
Chattahoochee Valley Community College,0.4661,0.4372,0.0492,0.0127,0.0023,0.0035,0.0151,0.0,0.0139
Concordia College Alabama,0.028,0.8758,0.0373,0.0093,0.0,0.0,0.0031,0.0466,0.0
South University-Montgomery,0.3046,0.6054,0.0153,0.0153,0.0153,0.0096,0.0,0.0019,0.0326
Enterprise State Community College,0.6408,0.2435,0.0509,0.0202,0.0081,0.0029,0.0254,0.0012,0.0069
James H Faulkner State Community College,0.6979,0.2259,0.032,0.0084,0.0177,0.0014,0.0152,0.0007,0.0009


# 4.5 Speeding up scalar selection

In [44]:
### [Tech]  .iat [], .at[]: 스칼라 선택 인덱서 - 
#             indexing(scala)  일 때의 성능향상
### [Goal] college에서 하나의 값(행도 한 개, 열 도 한 개)을 선택 할 때 성능이 좋은 방식이 있다. 

## >> How it works...

In [45]:
# 4.5.1 college.csv 를 INSTNM을 인덱스로 하여 읽어 들인다. 
# 하나의 대학의 하나의 컬럼값을 선택하기 위해서 .loc로 다음과 같이 전달 한다. 
college = pd.read_csv('data/college.csv', index_col='INSTNM')

cn = 'Texas A & M University-College Station'
college.loc[cn, 'UGDS_WHITE']

0.6609999999999999

In [46]:
# 4.5.2 .at[]  스칼라 인덱서를 사용해도 동일한 결과를 얻는다.  
college.at[cn, 'UGDS_WHITE']

0.6609999999999999

In [47]:
# 4.5.3 %timeit  명령어로 속도 차이를 측정한다. 

%timeit college.loc[cn, 'UGDS_WHITE']

31.1 µs ± 7.29 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [48]:
%timeit college.at[cn, 'UGDS_WHITE']

16.3 µs ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [49]:
# 4.5.4 해당 행 레이블,컬럼명에 대해서 인덱스 번호를 구하고
#       이를 .iloc[]와 .iat[] 연산자에 대해서 속도 비교 해보자. 

row_num = college.index.get_loc(cn)
col_num = college.columns.get_loc('UGDS_WHITE')

row_num, col_num

(3765, 10)

In [50]:
%timeit college.iloc[row_num, col_num]

31 µs ± 2.87 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [51]:
%timeit college.iat[row_num, col_num]

23.1 µs ± 3.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## >> There's more... 4.5

In [52]:
# .at[], .iat[]은 Series에 대해서도 잘 작동한다. 
state = college['STABBR']

In [53]:
state.iat[1000]

'IL'

In [54]:
state.at['Stanford University']

'CA'

# 4.6 Slicing rows lazily

In [55]:
### [Tech] 인덱서 사용에서 축약 표현에 대한 이해 
"""
iloc/loc의 생략 후 하나의 argument
df[lable scala/list] ==> df.loc[:, scala/list]  ==> act on column
df[number scala/list] ==> key error
df[slicer ]  ==> df.iloc/loc[slicer, :] ==> act on row
df[ bool series/ ndarray] ==> bool filter / act on row
"""
### [Goal] 행 인덱스와 컬럼 선택에 있어서 축약해서 기술하는 방법이 존재한다. 
# college 테이블을 예시로 살펴본다. 

'\niloc/loc의 생략 후 하나의 argument\ndf[lable scala/list] ==> df.loc[:, scala/list]  ==> act on column\ndf[number scala/list] ==> key error\ndf[slicer ]  ==> df.iloc/loc[slicer, :] ==> act on row\ndf[ bool series/ ndarray] ==> bool filter / act on row\n'

## >> How it works...

In [56]:
# 4.6.1 college.csv를 INSTNM을 인덱스로 읽어 들인다. 

college = pd.read_csv('data/college.csv', index_col='INSTNM')

In [57]:
"""
#  규칙 1 : 별도 인덱서 (iloc, loc, iat, at) 없이 
#      슬라이서 하나만 전달하면 이는 행 인덱스에 적용된다. 
#      college.iloc[10:2:2,  :] 와 동일
"""
college[10:20:2]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Birmingham Southern College,Birmingham,AL,0.0,0.0,0.0,1,560.0,560.0,0.0,1180.0,...,0.0051,0.0,0.0051,0.0017,1,0.192,0.4809,0.0152,44200.0,27000
Concordia College Alabama,Selma,AL,1.0,0.0,0.0,1,420.0,400.0,0.0,322.0,...,0.0031,0.0466,0.0,0.1056,1,0.8667,0.9333,0.2367,19900.0,PrivacySuppressed
Enterprise State Community College,Enterprise,AL,0.0,0.0,0.0,0,,,0.0,1729.0,...,0.0254,0.0012,0.0069,0.3823,1,0.4895,0.2263,0.3399,24600.0,8273
Faulkner University,Montgomery,AL,0.0,0.0,0.0,1,,,0.0,2367.0,...,0.0173,0.0182,0.0258,0.2302,1,0.5812,0.7253,0.4589,37200.0,22000
New Beginning College of Cosmetology,Albertville,AL,0.0,0.0,0.0,0,,,0.0,115.0,...,0.0,0.0,0.0,0.0783,1,0.8224,0.8553,0.3933,,5500


In [58]:
# 4.6.2 Series에 슬라이스를 적용하는 것과 동일 한 요령으로 작동한다. 
city = college['CITY']
city [10:20:2]

INSTNM
Birmingham Southern College              Birmingham
Concordia College Alabama                     Selma
Enterprise State Community College       Enterprise
Faulkner University                      Montgomery
New Beginning College of Cosmetology    Albertville
Name: CITY, dtype: object

In [59]:
# 4.6.3 인덱스 레이블로의 슬라이서로도 동일하게 작동 시킬 수 있다. 
start = 'Mesa Community College'
stop = 'Spokane Community College'
college [start: stop : 1500]

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Mesa Community College,Mesa,AZ,0.0,0.0,0.0,0,,,0.0,19055.0,...,0.0205,0.0257,0.0682,0.6457,1,0.3423,0.2207,0.401,35200.0,8000
Hair Academy Inc-New Carrollton,New Carrollton,MD,0.0,0.0,0.0,0,,,0.0,504.0,...,0.0,0.0,0.0,0.4683,1,0.9756,1.0,0.5882,15200.0,9666
National College of Natural Medicine,Portland,OR,0.0,0.0,0.0,0,,,0.0,,...,,,,,1,,,,,PrivacySuppressed


In [60]:
# 4.6.4 Series에서도 레이블 슬라이서가 잘 작동한다. 
city[start: stop: 1500]

INSTNM
Mesa Community College                            Mesa
Hair Academy Inc-New Carrollton         New Carrollton
National College of Natural Medicine          Portland
Name: CITY, dtype: object

## >> There's more... 4.6

In [61]:
"""
#  규칙 2 : 별도 인덱서 (iloc, loc, iat, at) 없이 
#      단일 레이블(정확하게는 컬럼명) 을  전달하면 이는 컬럼 인덱스에 적용된다.
#      즉 컬럼을 선택한다. 
#      college['STABBR'] 은 college.loc[:,'STABBR' ] 와 동일
#      숫자 값인 컬럼 인덱스 번호로는 사용 불가능,
#      즉 college[1]의 형태는 불가능  (college.iloc[:,1] 과 같지 않다. )
"""
print (">> case of college['STABBR']" )
display (college['STABBR'])

print ("\n\n\n>> case of college.loc[:, 'STABBR']")
display (college.loc[:,'STABBR'])

>> case of college['STABBR']


INSTNM
Alabama A & M University                                  AL
University of Alabama at Birmingham                       AL
Amridge University                                        AL
University of Alabama in Huntsville                       AL
Alabama State University                                  AL
                                                          ..
SAE Institute of Technology  San Francisco                CA
Rasmussen College - Overland Park                         KS
National Personal Training Institute of Cleveland         OH
Bay Area Medical Academy - San Jose Satellite Location    CA
Excel Learning Center-San Antonio South                   TX
Name: STABBR, Length: 7535, dtype: object




>> case of college.loc[:, 'STABBR']


INSTNM
Alabama A & M University                                  AL
University of Alabama at Birmingham                       AL
Amridge University                                        AL
University of Alabama in Huntsville                       AL
Alabama State University                                  AL
                                                          ..
SAE Institute of Technology  San Francisco                CA
Rasmussen College - Overland Park                         KS
National Personal Training Institute of Cleveland         OH
Bay Area Medical Academy - San Jose Satellite Location    CA
Excel Learning Center-San Antonio South                   TX
Name: STABBR, Length: 7535, dtype: object

In [62]:
college.iloc[:,1]

INSTNM
Alabama A & M University                                  AL
University of Alabama at Birmingham                       AL
Amridge University                                        AL
University of Alabama in Huntsville                       AL
Alabama State University                                  AL
                                                          ..
SAE Institute of Technology  San Francisco                CA
Rasmussen College - Overland Park                         KS
National Personal Training Institute of Cleveland         OH
Bay Area Medical Academy - San Jose Satellite Location    CA
Excel Learning Center-San Antonio South                   TX
Name: STABBR, Length: 7535, dtype: object

In [63]:
college[1]

KeyError: 1

In [64]:
"""
#  규칙 3 : 별도 인덱서 (iloc, loc, iat, at) 없이 
#      인덱스 값 목록, 인덱스 레이블 목록(정확하게는 컬럼명 목록) 을 
#      전달하면 이는 컬럼 인덱스에 적용되어 컬럼의 fancy indexing으로 작동한다. 

#      college[['CITY','STABBR']] 은 college.loc[:,['CITY','STABBR'] ] 와 동일
#      숫자 값인 컬럼 인덱스 번호로는 사용 불가능 
#       즉, college[[0,1]]의 형태는 불가능 (college.iloc[:,[0,1]] 와 같지 않다. )
"""
print (">> case of college[['CITY','STABBR']] " )
display (college[['CITY','STABBR']])

print ("\n\n\n>> case of college.loc[:,['CITY','STABBR'] ]")
display (college.loc[:,['CITY','STABBR'] ])

>> case of college[['CITY','STABBR']] 


Unnamed: 0_level_0,CITY,STABBR
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1
Alabama A & M University,Normal,AL
University of Alabama at Birmingham,Birmingham,AL
Amridge University,Montgomery,AL
University of Alabama in Huntsville,Huntsville,AL
Alabama State University,Montgomery,AL
...,...,...
SAE Institute of Technology San Francisco,Emeryville,CA
Rasmussen College - Overland Park,Overland Park,KS
National Personal Training Institute of Cleveland,Highland Heights,OH
Bay Area Medical Academy - San Jose Satellite Location,San Jose,CA





>> case of college.loc[:,['CITY','STABBR'] ]


Unnamed: 0_level_0,CITY,STABBR
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1
Alabama A & M University,Normal,AL
University of Alabama at Birmingham,Birmingham,AL
Amridge University,Montgomery,AL
University of Alabama in Huntsville,Huntsville,AL
Alabama State University,Montgomery,AL
...,...,...
SAE Institute of Technology San Francisco,Emeryville,CA
Rasmussen College - Overland Park,Overland Park,KS
National Personal Training Institute of Cleveland,Highland Heights,OH
Bay Area Medical Academy - San Jose Satellite Location,San Jose,CA


In [65]:
college.iloc[:,[0,1]] 

Unnamed: 0_level_0,CITY,STABBR
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1
Alabama A & M University,Normal,AL
University of Alabama at Birmingham,Birmingham,AL
Amridge University,Montgomery,AL
University of Alabama in Huntsville,Huntsville,AL
Alabama State University,Montgomery,AL
...,...,...
SAE Institute of Technology San Francisco,Emeryville,CA
Rasmussen College - Overland Park,Overland Park,KS
National Personal Training Institute of Cleveland,Highland Heights,OH
Bay Area Medical Academy - San Jose Satellite Location,San Jose,CA


In [66]:
college[[0,1]]

KeyError: "None of [Int64Index([0, 1], dtype='int64')] are in the [columns]"

In [67]:
"""
#  규칙 4 : 별도 인덱서 (iloc, loc, iat, at) 없이 
#      DataFrame과 동일한 길이의 boolean 리스트, Series , ndarray 이면 
#      boolean indexing 으로 작동한다. 이 부분은 다음 장에서 학습한다.  
"""
# booleang indexing
college[college.STABBR == 'NY'].head(3)

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Tri-State College of Acupuncture,New York,NY,0.0,0.0,0.0,0,,,0.0,,...,,,,,1,,,,PrivacySuppressed,PrivacySuppressed
Vaughn College of Aeronautics and Technology,Flushing,NY,0.0,0.0,0.0,0,,,0.0,1605.0,...,0.0467,0.0206,0.053,0.2143,1,0.652,0.6792,0.4142,48700,22625
Adelphi University,Garden City,NY,0.0,0.0,0.0,0,550.0,565.0,0.0,5036.0,...,0.0208,0.0381,0.0743,0.0913,1,0.3079,0.5982,0.1562,51300,25000


# 4.7 Slicing Lexicographically

In [None]:
### [Tech] 레이블이 정렬 되어 있을 때,  정확히 매치 되지 않는 레이블 값 (부분매치)으로 
#          범위 슬라이스가 가능함 (기존 .loc 는 정확한 이름 매칭에 의해 작동)
### [Goal] college에서 행 레이블에 대해서 사전 검색 하듯이 부분 값 레이블을 사용할 수 있다. 

## >> How it works...

In [68]:
# 4.7.1 college.csv 를 INSTNM을 인덱스로 하여 읽어들인다. 
college = pd.read_csv('data/college.csv', index_col='INSTNM')

In [69]:
# 4.7.2 'Sp'와 'Su'사이의 값을 구할 것을 기대하고 슬라이스를 전달했지만
#    정렬이 되어 있지 않으면 오류가 발생한다. 
college.loc['Sp':'Su']

KeyError: 'Sp'

In [70]:
# 4.7.3 인덱스의 정렬
college = college.sort_index()
college.head(3)

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
A & W Healthcare Educators,New Orleans,LA,0.0,0.0,0.0,0,,,0.0,40.0,...,0.0,0.0,0.0,0.125,1,0.7018,0.8596,0.6667,,19022.5
A T Still University of Health Sciences,Kirksville,MO,0.0,0.0,0.0,0,,,0.0,,...,,,,,1,,,,219800.0,PrivacySuppressed
ABC Beauty Academy,Garland,TX,0.0,0.0,0.0,0,,,0.0,30.0,...,0.0,0.0,0.0,0.0,0,0.7857,0.0,0.8286,,PrivacySuppressed


In [71]:
# 4.7.4 college.loc['Sp':'Su'] 가 작동한다. 
college.loc['Sp':'Su']

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Spa Tech Institute-Ipswich,Ipswich,MA,0.0,0.0,0.0,0,,,0.0,37.0,...,0.0000,0.0000,0.0541,0.4054,1,0.2656,0.3906,0.7907,21500,6333
Spa Tech Institute-Plymouth,Plymouth,MA,0.0,0.0,0.0,0,,,0.0,153.0,...,0.0000,0.0000,0.2484,0.3399,1,0.3716,0.4266,0.6250,21500,6333
Spa Tech Institute-Westboro,Westboro,MA,0.0,0.0,0.0,0,,,0.0,90.0,...,0.0000,0.0000,0.0222,0.5778,1,0.3409,0.4545,0.6882,21500,6333
Spa Tech Institute-Westbrook,Westbrook,ME,0.0,0.0,0.0,0,,,0.0,240.0,...,0.0000,0.0000,0.0042,0.2542,1,0.4350,0.5093,0.5224,21500,6333
Spalding University,Louisville,KY,0.0,0.0,0.0,1,490.0,440.0,0.0,1227.0,...,0.0302,0.0016,0.0326,0.2502,1,0.4442,0.6725,0.3764,41700,25000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Studio Academy of Beauty,Chandler,AZ,0.0,0.0,0.0,0,,,0.0,332.0,...,0.0392,0.0000,0.0090,0.0000,1,0.5855,0.6218,0.5675,,6333
Studio Jewelers,New York,NY,0.0,0.0,0.0,0,,,0.0,55.0,...,0.0000,0.0364,0.0000,0.6000,1,0.0451,0.0902,0.8525,PrivacySuppressed,PrivacySuppressed
Stylemaster College of Hair Design,Longview,WA,0.0,0.0,0.0,0,,,0.0,77.0,...,0.0130,0.0000,0.0000,0.0000,1,0.8036,0.7024,0.4510,17000,13320
Styles and Profiles Beauty College,Selmer,TN,0.0,0.0,0.0,0,,,0.0,31.0,...,0.0000,0.0000,0.0000,0.0000,1,0.8182,0.7955,0.2400,PrivacySuppressed,PrivacySuppressed


## >> There's more... 4.7

In [72]:
# 정렬 순서에 따라 (오름차순 /내림차순) 슬라이싱의 방향이 정해진다. 

college = college.sort_index ( ascending = False)
college.index.is_monotonic_decreasing 

True

In [73]:
college.loc['E':'B']

Unnamed: 0_level_0,CITY,STABBR,HBCU,MENONLY,WOMENONLY,RELAFFIL,SATVRMID,SATMTMID,DISTANCEONLY,UGDS,...,UGDS_2MOR,UGDS_NRA,UGDS_UNKN,PPTUG_EF,CURROPER,PCTPELL,PCTFLOAN,UG25ABV,MD_EARN_WNE_P10,GRAD_DEBT_MDN_SUPP
INSTNM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Dyersburg State Community College,Dyersburg,TN,0.0,0.0,0.0,0,,,0.0,2001.0,...,0.0185,0.0010,0.0085,0.4423,1,0.4921,0.2493,0.3097,26800,7475
Dutchess Community College,Poughkeepsie,NY,0.0,0.0,0.0,0,,,0.0,6885.0,...,0.0446,0.0129,0.0049,0.3312,1,0.2464,0.1936,0.1806,32500,10250
Dutchess BOCES-Practical Nursing Program,Poughkeepsie,NY,0.0,0.0,0.0,0,,,0.0,155.0,...,0.0581,0.0000,0.0000,0.7548,1,0.5294,0.6275,0.5430,36500,9500
Durham Technical Community College,Durham,NC,0.0,0.0,0.0,0,,,0.0,4769.0,...,0.0182,0.0025,0.0457,0.6905,1,0.4495,0.1796,0.5961,27200,11069.5
Durham Beauty Academy,Durham,NC,0.0,0.0,0.0,0,,,0.0,78.0,...,0.0000,0.0000,0.0128,0.0000,1,0.5746,0.8134,0.4000,PrivacySuppressed,15332
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Bacone College,Muskogee,OK,0.0,0.0,0.0,1,398.0,428.0,0.0,939.0,...,0.0298,0.0000,0.0895,0.1140,1,0.9392,0.8920,0.1648,29700,26350
Babson College,Wellesley,MA,0.0,0.0,0.0,0,615.0,660.0,0.0,2107.0,...,0.0233,0.2682,0.0603,0.0000,1,0.1709,0.3727,0.0090,86700,27000
BJ's Beauty & Barber College,Auburn,WA,0.0,0.0,0.0,0,,,0.0,28.0,...,0.0714,0.0000,0.0714,0.0000,1,0.5192,0.6154,0.2917,,PrivacySuppressed
BIR Training Center,Chicago,IL,0.0,0.0,0.0,0,,,0.0,2132.0,...,0.0000,0.0000,0.0000,0.1806,0,0.6700,0.6998,0.6741,PrivacySuppressed,15394
