# Filtering the pandas.DataFrame with boolean arrays (masks)

[**See `filter` instructions**](https://datons.craft.me/h3f5pSQSE7l6RW) to complete the following exercises.

## Data

In [1]:
import pandas as pd

df_countries = pd.read_csv('data/gapminder.csv', index_col=0)
df_countries

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
Albania,Europe,2007,76.423,3600523,5937.029526,ALB,8
...,...,...,...,...,...,...,...
Zambia,Africa,2007,42.384,11746035,1271.211593,ZMB,894
Zimbabwe,Africa,2007,43.487,12311143,469.709298,ZWE,716


## Single condition

**Exercise**: Filter countries from `Asia`.

### Categorical

#### Create mask

```python
mask = df['column'] == 'value'
```

country
Afghanistan     True
Albania        False
               ...  
Zambia         False
Zimbabwe       False
Name: continent, Length: 142, dtype: bool

#### Filter DataFrame with mask

```python
df[mask]
```

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
Bahrain,Asia,2007,75.635,708573,29796.048340,BHR,48
...,...,...,...,...,...,...,...
West Bank and Gaza,Asia,2007,73.422,4018332,3025.349798,PSE,275
"Yemen, Rep.",Asia,2007,62.698,22211743,2280.769906,YEM,887


### Numerical

**Exercise**: Filter countries with `lifeExp greater than 80` years old.

#### Create mask

```python
mask = df['column'] > number
```

country
Afghanistan    False
Albania        False
               ...  
Zambia         False
Zimbabwe       False
Name: lifeExp, Length: 142, dtype: bool

#### Filter DataFrame with mask

```python
df[mask]
```

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Australia,Oceania,2007,81.235,20434176,34435.36744,AUS,36
Canada,Americas,2007,80.653,33390141,36319.23501,CAN,124
...,...,...,...,...,...,...,...
Sweden,Europe,2007,80.884,9031088,33859.74835,SWE,752
Switzerland,Europe,2007,81.701,7554661,37506.41907,CHE,756


## Combine multiple conditions

**Exercise**: Filter countries from `Asia` and with `lifeExp greater than 80` years old.

### Intersection

```python
mask = mask1 & mask2 # intersection (true on both conditions)
```

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Hong Kong, China",Asia,2007,82.208,6980412,39724.97867,HKG,344
Israel,Asia,2007,80.745,6426679,25523.2771,ISR,376
Japan,Asia,2007,82.603,127467972,31656.06806,JPN,392


### Union

```python
mask = mask1 | mask2 # union (true on at least one condition)
```

Unnamed: 0_level_0,continent,year,lifeExp,pop,gdpPercap,iso_alpha,iso_num
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Afghanistan,Asia,2007,43.828,31889923,974.580338,AFG,4
Australia,Oceania,2007,81.235,20434176,34435.367440,AUS,36
...,...,...,...,...,...,...,...
West Bank and Gaza,Asia,2007,73.422,4018332,3025.349798,PSE,275
"Yemen, Rep.",Asia,2007,62.698,22211743,2280.769906,YEM,887


## Bonus: df[mask] vs df.query

**Exercise**: Filter the year 2021.

Unnamed: 0,technology,year,month,day,hour,generation_mwh
0,Carbon,2019,1,1,0,1867.0
1,Carbon,2019,1,1,1,1618.0
...,...,...,...,...,...,...
420862,Other Renewables,2021,12,31,22,607.5
420863,Other Renewables,2021,12,31,23,591.6


### Dummy

Unnamed: 0,technology,year,month,day,hour,generation_mwh
17544,Carbon,2021,1,1,0,250.0
17545,Carbon,2021,1,1,1,250.0
...,...,...,...,...,...,...
420862,Other Renewables,2021,12,31,22,607.5
420863,Other Renewables,2021,12,31,23,591.6


### Proficient

Unnamed: 0,technology,year,month,day,hour,generation_mwh
17544,Carbon,2021,1,1,0,250.0
17545,Carbon,2021,1,1,1,250.0
...,...,...,...,...,...,...
420862,Other Renewables,2021,12,31,22,607.5
420863,Other Renewables,2021,12,31,23,591.6


## Query with multiple conditions

**Exercise**: Filter the year `2021` and `Eolic` technology.

### Dummy

Unnamed: 0,technology,year,month,day,hour,generation_mwh
122760,Eolic,2021,1,1,0,8557.5
122761,Eolic,2021,1,1,1,8661.6
...,...,...,...,...,...,...
131518,Eolic,2021,12,31,22,6081.8
131519,Eolic,2021,12,31,23,6255.3


### Proficient

Unnamed: 0,technology,year,month,day,hour,generation_mwh
122760,Eolic,2021,1,1,0,8557.5
122761,Eolic,2021,1,1,1,8661.6
...,...,...,...,...,...,...
131518,Eolic,2021,12,31,22,6081.8
131519,Eolic,2021,12,31,23,6255.3
