## 7. How to select multiple rows and columns from a DataFrame?

We will be learning about two powerful DataFrame methods 'loc' and 'iloc'. These are extremely versatile tools, so the materials covered will push your learning, but won't be exhaustive.

In [1]:
import pandas as pd

We can select multiple rows and columns from a pandas DataFrame using “loc” and “iloc”. They are extremely powerful and flexible DataFrame methods. We will use the UFO sightings report dataset to learn how to answer the above question.

In [2]:
ufo = pd.read_csv("http://bit.ly/uforeports", parse_dates = ["Time"])
ufo.head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00


### 7.1. Using loc with row(s) and column(s) labels

We can use loc to select rows and columns from a DataFrame. Label meaning, the index for rows, and column names for columns. We simply pass “what rows we want, what columns we want” to a pair of capital brackets. When we use “:” instead of specifying what rows we want, we mean we want all rows and similarly for columns.


#### 7.1.1. Row(s) selection

If we want a row or multiple rows and all columns, we simply specify the indexes of rows we want, it may be a single index or list of indexes or a range of indexes, and put “:” in columns position.

In [3]:
ufo.loc[0, :]

City                            Ithaca
Colors Reported                    NaN
Shape Reported                TRIANGLE
State                               NY
Time               1930-06-01 22:00:00
Name: 0, dtype: object

In [4]:
ufo.loc[[0,1,2], :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00


In [5]:
ufo.loc[0:2, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00


#### 7.1.2. Column(s) selection

We have already learned to select columns while studying pandas series, but it was not as explicit as using loc can be. Besides, if we are not using "loc" explicitly we may sometime get an error as well. We will discuss the error and cause of error in a latter video. However, it is necessary to know both ways, as both ways are popular and we often need to read someone else’s code.

In [6]:
ufo["City"].head()

0                  Ithaca
1             Willingboro
2                 Holyoke
3                 Abilene
4    New York Worlds Fair
Name: City, dtype: object

In [7]:
ufo.loc[:, "City"].head()

0                  Ithaca
1             Willingboro
2                 Holyoke
3                 Abilene
4    New York Worlds Fair
Name: City, dtype: object

Notice that the two lines above provide the same output, but I prefer being more explicit, as it helps someone reading my code to understand better, and would recommend doing the same. We can similarly, use loc to select multiple columns.


In [8]:
ufo[["City", "State"]].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In [9]:
ufo.loc[:, ["City", "State"]].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


With loc, we can also specify the range for columns, which we could not have done by our previous approach.

In [10]:
ufo.loc[:, "City":"State"].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO
3,Abilene,,DISK,KS
4,New York Worlds Fair,,LIGHT,NY


#### 7.1.3. Row(s) and Column(s) selection

Selecting a portion of rows and columns at the same time is more common than selecting rows or columns only. It’s the same process, we just don’t use “:”.


In [11]:
ufo.describe()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
count,18216,2882,15597,18241,18241
unique,6476,27,27,52,16145
top,Seattle,RED,LIGHT,CA,1999-11-16 19:00:00
freq,187,780,2803,2529,27
first,,,,,1930-06-01 22:00:00
last,,,,,2000-12-31 23:59:00


In [12]:
ufo.describe().loc["top", "City"]

'Seattle'

In [13]:
ufo.describe().loc["count":"freq", "City":"State"]

Unnamed: 0,City,Colors Reported,Shape Reported,State
count,18216,2882,15597,18241
unique,6476,27,27,52
top,Seattle,RED,LIGHT,CA
freq,187,780,2803,2529


### 7.2. Using loc with Boolean condition

#### 7.2.1. Single condition

Again, we have already learned how to filter rows based on column values, but using loc for the problem makes it more explicit and less prone to errors.

In [14]:
#not so explicit
ufo[ufo.City=="Oakland"].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
1694,Oakland,,CIGAR,CA,1968-07-21 14:00:00
2144,Oakland,,DISK,CA,1971-08-19 00:00:00
4686,Oakland,,LIGHT,MD,1982-06-01 00:00:00
7293,Oakland,,LIGHT,CA,1994-03-28 17:00:00
8488,Oakland,,,CA,1995-08-10 21:45:00


In [15]:
#explicit
ufo.loc[ufo.City=="Oakland", :].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
1694,Oakland,,CIGAR,CA,1968-07-21 14:00:00
2144,Oakland,,DISK,CA,1971-08-19 00:00:00
4686,Oakland,,LIGHT,MD,1982-06-01 00:00:00
7293,Oakland,,LIGHT,CA,1994-03-28 17:00:00
8488,Oakland,,,CA,1995-08-10 21:45:00


Using conditions with loc is still the same as providing loc which column and rows we want. Instead of specifying the labels, we will be passing a series of Booleans where “True” corresponds to what we want, and “False” corresponds to what we don’t.

In [16]:
#not so explicit
ufo[ufo.City=="Oakland"]["State"].head()

1694    CA
2144    CA
4686    MD
7293    CA
8488    CA
Name: State, dtype: object

In [17]:
#explicit
ufo.loc[ufo.City=="Oakland", "State"].head()

1694    CA
2144    CA
4686    MD
7293    CA
8488    CA
Name: State, dtype: object

#### 7.2.2. Multiple condition

Using multiple filter criteria with loc is similar to how we used it without loc, but of course more explicit.
 

In [18]:
ufo.loc[(ufo.City.isna()) & (ufo["Colors Reported"]=="RED"), :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
3123,,RED,TRIANGLE,WV,1975-11-25 23:00:00
12441,,RED,FIREBALL,WA,1998-10-26 17:58:00


### 7.3. Using iloc with Integer position

We use “iloc” for filtering or selecting rows and columns by their integer position. It's similar to “loc” as we have to tell what rows we want and what columns we want but instead of labels, we use their integer position.

In [19]:
ufo.columns

Index(['City', 'Colors Reported', 'Shape Reported', 'State', 'Time'], dtype='object')

In [20]:
list(range(0,4))

[0, 1, 2, 3]

#### 7.3.1. Row(s) selection

Selecting rows with “iloc” is the same as with “loc” we just have to specify the integer position of rows instead of their labels. Also, when we are specifying range with iloc, it will be exclusive of the right limit. This difference between “loc” and “iloc” emphasizes that with “loc” we are specifying labels (indexes and column names) and with “iloc” we are specifying integer positions.

In [21]:
ufo.iloc[[0, 3], :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00


In [22]:
ufo.iloc[0:3, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00


The code below requires a bit of understanding of python lists negative indexing. It might help to learn it on your own if you are not familiar with it. The code below is saying, “I need all rows except the last 30 rows and all columns”. Since the first part of the range is missing, it will be automatically taken as “0”, “-30” is the integer position of 30th row counting from the last row, whose integer position will be “-1”.


In [23]:
ufo.iloc[:-30, :]

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
0,Ithaca,,TRIANGLE,NY,1930-06-01 22:00:00
1,Willingboro,,OTHER,NJ,1930-06-30 20:00:00
2,Holyoke,,OVAL,CO,1931-02-15 14:00:00
3,Abilene,,DISK,KS,1931-06-01 13:00:00
4,New York Worlds Fair,,LIGHT,NY,1933-04-18 19:00:00
...,...,...,...,...,...
18206,Cerrilillo,,,NM,2000-12-28 11:00:00
18207,Mansfield,,FLASH,TX,2000-12-28 12:00:00
18208,Murphreesboro,,FLASH,TN,2000-12-28 12:15:00
18209,Houston,,LIGHT,TX,2000-12-28 17:09:00


#### 7.3.2. Column(s) selection

Using “iloc” with columns is again similar to using “loc” with columns, but we use integer position for columns we want.

In [24]:
ufo.iloc[:, [0, 3]].head()

Unnamed: 0,City,State
0,Ithaca,NY
1,Willingboro,NJ
2,Holyoke,CO
3,Abilene,KS
4,New York Worlds Fair,NY


In [25]:
ufo.iloc[:, 0:4].head()

Unnamed: 0,City,Colors Reported,Shape Reported,State
0,Ithaca,,TRIANGLE,NY
1,Willingboro,,OTHER,NJ
2,Holyoke,,OVAL,CO
3,Abilene,,DISK,KS
4,New York Worlds Fair,,LIGHT,NY


The code below is saying, “I need all columns except the last 2 and all rows”. Since the first part of the range is missing, it will be automatically taken as “0”, “-2” is the integer position of 2nd column counting from the last column, whose integer position will be “-1”.

In [27]:
ufo.iloc[:, :-2]

Unnamed: 0,City,Colors Reported,Shape Reported
0,Ithaca,,TRIANGLE
1,Willingboro,,OTHER
2,Holyoke,,OVAL
3,Abilene,,DISK
4,New York Worlds Fair,,LIGHT
...,...,...,...
18236,Grant Park,,TRIANGLE
18237,Spirit Lake,,DISK
18238,Eagle River,,
18239,Eagle River,RED,LIGHT


#### 7.3.3. Row(s) and Column(s) selection

We can combine what we learned for selecting rows and columns with “iloc” to be able to select both rows and columns at the same time.

In [25]:
ufo.describe()

Unnamed: 0,City,Colors Reported,Shape Reported,State,Time
count,18216,2882,15597,18241,18241
unique,6476,27,27,52,16145
top,Seattle,RED,LIGHT,CA,1999-11-16 19:00:00
freq,187,780,2803,2529,27
first,,,,,1930-06-01 22:00:00
last,,,,,2000-12-31 23:59:00


In [26]:
ufo.describe().iloc[2,0]

'Seattle'

In [27]:
ufo.describe().iloc[0:4, 0:4]

Unnamed: 0,City,Colors Reported,Shape Reported,State
count,18216,2882,15597,18241
unique,6476,27,27,52
top,Seattle,RED,LIGHT,CA
freq,187,780,2803,2529
