# 06. Selecting Subsets of Data from DataFrames with `.iloc`

# Getting started with `.iloc`
The `.iloc` indexer is very similar to `.loc` but only uses **integer locations** to make its selections. The word `iloc` itself stands for integer location so that should help remind you what it does.

Same as loc, it's int location. Do NOT put strings there.

## Simultaneous row and column subset selection with `.iloc`
Selection with .iloc will look like the following:

```
df.iloc[rows, cols]
```

In [1]:
import pandas as pd
df = pd.read_csv('../data/sample_data.csv', index_col=0)
df

Unnamed: 0,state,color,food,age,height,score
Jane,NY,blue,Steak,30,165,4.6
Niko,TX,green,Lamb,2,70,8.3
Aaron,FL,red,Mango,12,120,9.0
Penelope,AL,white,Apple,4,80,3.3
Dean,AK,gray,Cheese,32,180,1.8
Christina,TX,black,Melon,33,172,9.5
Cornelia,TX,red,Beans,69,150,2.2


### Use a list for both rows and columns

In [2]:
rows = [2, 4]
cols = [0, -1] # negative starts from the end, first and the last columns 

df.iloc[rows, cols]

Unnamed: 0,state,score
Aaron,FL,9.0
Dean,AK,1.8


## The possible types of selections for `.iloc`
Row or column selections can be any of the following:

* A single integer
* A list of integers
* A slice with integers

### Slice the rows and use a list for the columns

In [4]:
cols = [4, 2]
df.iloc[::2, cols] # one int, a list of int or a slice of int
# ：：2 means every other row begin from the first one 

Unnamed: 0,height,food
Jane,165,Steak
Aaron,120,Mango
Dean,180,Cheese
Cornelia,150,Beans


### Use a list for the rows and a slice for the columns

In [6]:
rows = [5, 2, 4]
df.iloc[rows, 3:] # slice for the columns , string of the row

Unnamed: 0,age,height,score
Christina,33,172,9.5
Aaron,12,120,9.0
Dean,32,180,1.8


## Selecting some rows and all of the columns
If you leave the column selection empty, then all of the columns will be selected.

In [7]:
rows = [3, 2]
df.iloc[rows] 

Unnamed: 0,state,color,food,age,height,score
Penelope,AL,white,Apple,4,80,3.3
Aaron,FL,red,Mango,12,120,9.0


In [8]:
df.iloc[rows, :] # same, dont need the empty slice to get all the columns

Unnamed: 0,state,color,food,age,height,score
Penelope,AL,white,Apple,4,80,3.3
Aaron,FL,red,Mango,12,120,9.0


## Select all of the rows and some of the columns

In [10]:
cols = [1, 5]
df.iloc[:, cols] # slice for the rows, a list for the columns

Unnamed: 0,color,score
Jane,blue,4.6
Niko,green,8.3
Aaron,red,9.0
Penelope,white,3.3
Dean,gray,1.8
Christina,black,9.5
Cornelia,red,2.2


## Cannot do this with *just the brackets*
Just the brackets does select columns but it only understands **labels** and not **integer location**.

In [11]:
cols = [1, 5]
df[cols] # cannot do int local for [], just labels/strings/column names

KeyError: '[1 5] not in index'

## Select some rows and a single column
Note that a Series is returned whenever a single row or column is selected.

In [12]:
rows = [2, 3, 5]
cols = 4

df.iloc[rows, cols] # a series

Aaron        120
Penelope      80
Christina    172
Name: height, dtype: int64

## A trick to select a single row row or column as a DataFrame and NOT a Series
You can select a single row (or column) and return a DataFrame and not a Series if you use a list to make the selection.

In [13]:
rows = [2, 3, 5]
cols = [4]           # one item list 

df.iloc[rows, cols]  # keep the dataframe, one col df

Unnamed: 0,height
Aaron,120
Penelope,80
Christina,172


## Select a single row as a Series with `.iloc`
By passing a single integer to `.iloc`, it will select one row as a Series:

In [14]:
df.iloc[2] # select second row of data

state        FL
color       red
food      Mango
age          12
height      120
score         9
Name: Aaron, dtype: object

# Summary of `.iloc`
Is the exact same as `.loc` but uses **integer location** only for selection. The official Pandas documentation refers to this as selection by **position**.

# Exericses

### Problem 1
<span  style="color:green; font-size:16px">Select the rows with integer location 10, 5, and 1</span>

In [17]:
# your code here
movie = pd.read_csv('../data/movie.csv', index_col= 'title')
movie.head(2) 

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1


In [18]:
# your code here
rows = [10, 5 ,1]
movie.iloc[rows]

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Batman v Superman: Dawn of Justice,2016.0,Color,PG-13,183.0,Zack Snyder,0.0,Henry Cavill,15000.0,Lauren Cohan,4000.0,...,2000.0,330249062.0,Action|Adventure|Sci-Fi,673.0,371639,based on comic book|batman|sequel to a reboot|...,English,USA,250000000.0,6.9
John Carter,2012.0,Color,PG-13,132.0,Andrew Stanton,475.0,Daryl Sabara,640.0,Samantha Morton,632.0,...,530.0,73058679.0,Action|Adventure|Sci-Fi,462.0,212204,alien|american civil war|male nipple|mars|prin...,English,USA,263700000.0,6.6
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1


### Problem 2
<span  style="color:green; font-size:16px">Select the columns with integer location 10, 5, and 1</span>

In [20]:
# your code here
col = [10, 5, 1]
movie.iloc[:, col]

Unnamed: 0_level_0,actor3,director_fb,color
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar,Wes Studi,0.0,Color
Pirates of the Caribbean: At World's End,Jack Davenport,563.0,Color
Spectre,Stephanie Sigman,0.0,Color
The Dark Knight Rises,Joseph Gordon-Levitt,22000.0,Color
Star Wars: Episode VII - The Force Awakens,,131.0,
John Carter,Polly Walker,475.0,Color
Spider-Man 3,Kirsten Dunst,0.0,Color
Tangled,M.C. Gainey,15.0,Color
Avengers: Age of Ultron,Scarlett Johansson,0.0,Color
Harry Potter and the Half-Blood Prince,Rupert Grint,282.0,Color


### Problem 3
<span  style="color:green; font-size:16px">Select rows with integer location 100 to but not including 105 along with the column integer location 5.</span>

In [23]:
# your code here
rows = 100: 105
col = 5
movie.iloc[rows, col]

SyntaxError: invalid syntax (<ipython-input-23-8af6b4ab9683>, line 2)

In [24]:
# your code here
movie.iloc[100: 105, 5]

title
The Fast and the Furious                   357.0
The Curious Case of Benjamin Button      21000.0
X-Men: First Class                         905.0
The Hunger Games: Mockingjay - Part 2      508.0
The Sorcerer's Apprentice                  226.0
Name: director_fb, dtype: float64

# Continue making selections with `.iloc` below

In [36]:
# your code here
movie.iloc[::1, 2].head(5)

title
Avatar                                        PG-13
Pirates of the Caribbean: At World's End      PG-13
Spectre                                       PG-13
The Dark Knight Rises                         PG-13
Star Wars: Episode VII - The Force Awakens      NaN
Name: content_rating, dtype: object

In [37]:
movie.iloc[:, 1:10:2].head(5) #starts with the second and ends with 10th (not including 10), step is 2
# ::1 = : 

Unnamed: 0_level_0,color,duration,director_fb,actor1_fb,actor2_fb
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Avatar,Color,178.0,0.0,1000.0,936.0
Pirates of the Caribbean: At World's End,Color,169.0,563.0,40000.0,5000.0
Spectre,Color,148.0,0.0,11000.0,393.0
The Dark Knight Rises,Color,164.0,22000.0,27000.0,23000.0
Star Wars: Episode VII - The Force Awakens,,,131.0,131.0,12.0


In [30]:
movie.iloc[1, :].loc['year']

2007.0

In [33]:
movie.loc[:,'year'].iloc[1]

2007.0