# Filtering Pandas DataFrame

DataFrames can be very large and often we need to filter them to extract the columns with information that we need and do something to those data, i.e., a comparison. For example, in the "brics" data, we will select the countries that have an area over 8 million square km.  

There are three steps to do this: 1) Select the area column (in a series not dataframe, it can be done with square brackets, or with the `loc` and `iloc` functions), 2) Do a comparison on the area column, 3) Use those results to select the countries.



In [1]:
import pandas as pd
brics = pd.read_csv("brics.csv", index_col= 0)
print(brics, "\n")

# get a boolean of which countries have an area > 8
is_huge = brics["area"] > 8
print(is_huge, "\n")

# Subset the dataframe by the boolean series
print(brics[is_huge], "\n")

# Done in one line
brics[brics["area"] > 8]

         country    capital    area  population
BR        Brazil   Brasilia   8.516      200.40
RU        Russia     Moscow  17.100      143.50
IN         India  New Delhi   3.286     1252.00
CH         China    Beijing   9.597     1357.00
SA  South Africa   Pretoria   1.221       52.98 

BR     True
RU     True
IN    False
CH     True
SA    False
Name: area, dtype: bool 

   country   capital    area  population
BR  Brazil  Brasilia   8.516       200.4
RU  Russia    Moscow  17.100       143.5
CH   China   Beijing   9.597      1357.0 



Unnamed: 0,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.1,143.5
CH,China,Beijing,9.597,1357.0


Now let's do it with `boolean` operators. Because Pandas is built on Numpy, we can do it with the logical operators from that package. To get the countries with an area larger than 8 but smaller than 10.

In [2]:
import numpy as np
huge = np.logical_and(brics["area"] > 8, brics["area"] <10)
print(brics[huge])

# One liner
brics[np.logical_and(brics["area"] > 8, brics["area"] <10)]

   country   capital   area  population
BR  Brazil  Brasilia  8.516       200.4
CH   China   Beijing  9.597      1357.0


Unnamed: 0,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
CH,China,Beijing,9.597,1357.0
