# Data Manipulation - Filters

## Learnings:

- rename columns in a DataFrame
- manipulate columns in a DataFrame (select, reorder, delete)
- filter dataframe
- assign to a column based on a condition

In [4]:
import pandas as pd

data = pd.read_csv('vehicles.csv')
data.head(2)

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550


In [5]:
data.shape

(35952, 15)

In [6]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35952 entries, 0 to 35951
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Make                     35952 non-null  object 
 1   Model                    35952 non-null  object 
 2   Year                     35952 non-null  int64  
 3   Engine Displacement      35952 non-null  float64
 4   Cylinders                35952 non-null  float64
 5   Transmission             35952 non-null  object 
 6   Drivetrain               35952 non-null  object 
 7   Vehicle Class            35952 non-null  object 
 8   Fuel Type                35952 non-null  object 
 9   Fuel Barrels/Year        35952 non-null  float64
 10  City MPG                 35952 non-null  int64  
 11  Highway MPG              35952 non-null  int64  
 12  Combined MPG             35952 non-null  int64  
 13  CO2 Emission Grams/Mile  35952 non-null  float64
 14  Fuel Cost/Year        

## Checking the dataframe column names

Rename all columns at once:
- `data.columns` is an **attribute** of the DataFrame which results in a list-like of the column names
    - You can substitute it by another list containing the names you want 
    - Note you have to substitute the whole set of column names at once
    
- `data.rename()` is a **method** of a DataFrame, in which you can rename one column at once
    - You just need to pass a dictionary containing {'old_name':'new_name'} 
    - By default, it changes names of a **index** (`axis=0`), you can specify `axis=1` to change **column** names
    - the `inplace` argument

In [8]:
%timeit data.loc[:, 'Make']

51.4 µs ± 5.85 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [9]:
%timeit data['Make']

3.25 µs ± 77 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### Substituting `.columns` attribute

In [None]:
# say for example we want to convert all columns to lowercase!

In [10]:
data.columns = ['make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
               'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
               'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
               'CO2 Emission Grams/Mile', 'xxxxxxx']

In [11]:
data.head()

Unnamed: 0,make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,xxxxxxx
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [12]:
data.columns = ['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
               'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
               'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
               'CO2 Emission Grams/Mile', 'Fuel Cost/Year']

In [13]:
data.columns

Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
       'CO2 Emission Grams/Mile', 'Fuel Cost/Year'],
      dtype='object')

In [14]:
colnames = []
for col in data.columns:
    colnames.append(col.lower())

In [15]:
data.columns = [col.lower().replace(' ','_').replace('/','_') for col in data.columns]

In [16]:
data.head()

Unnamed: 0,make,model,year,engine_displacement,cylinders,transmission,drivetrain,vehicle_class,fuel_type,fuel_barrels_year,city_mpg,highway_mpg,combined_mpg,co2_emission_grams_mile,fuel_cost_year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [17]:
data.columns = ['manufacturer']

ValueError: Length mismatch: Expected axis has 15 elements, new values have 1 elements

### `.rename() method`

`.rename({'old_column':'new_column'})`

#### returning a new dataframe

In [22]:
data.rename({'make': 'manufacturer'}, axis=1)

Unnamed: 0,manufacturer,model,model_year,engine_displacement,cylinders,transmission,drivetrain,vehicle_class,fuel_type,fuel_barrels_year,city_mpg,highway_mpg,combined_mpg,co2_emission_grams_mile,fuel_cost_year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


In [19]:
data.rename(columns={'make': 'manufacturer', 'year':'model_year'})

Unnamed: 0,manufacturer,model,model_year,engine_displacement,cylinders,transmission,drivetrain,vehicle_class,fuel_type,fuel_barrels_year,city_mpg,highway_mpg,combined_mpg,co2_emission_grams_mile,fuel_cost_year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


In [25]:
data = data.rename(columns={'make': 'manufacturer', 'year':'model_year'})

In [26]:
y.head(2)

NameError: name 'y' is not defined

#### inplace

In [27]:
data.rename({'engine_displacement': 'engine_displacement2',
             'vehicle_class': 'vehicle_class2'}, axis=1, inplace=True)

In [28]:
# dataframe already changed
data.head()

Unnamed: 0,manufacturer,model,model_year,engine_displacement2,cylinders,transmission,drivetrain,vehicle_class2,fuel_type,fuel_barrels_year,city_mpg,highway_mpg,combined_mpg,co2_emission_grams_mile,fuel_cost_year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


If you try to assign an `inplace=True` command, check what happens:

In [29]:
data.rename({'year3': 'year10'}, axis=1)

Unnamed: 0,manufacturer,model,model_year,engine_displacement2,cylinders,transmission,drivetrain,vehicle_class2,fuel_type,fuel_barrels_year,city_mpg,highway_mpg,combined_mpg,co2_emission_grams_mile,fuel_cost_year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


In [30]:
y = data.rename({'year': 'year3'}, axis=1, inplace=True)

In [31]:
data.head()

Unnamed: 0,manufacturer,model,model_year,engine_displacement2,cylinders,transmission,drivetrain,vehicle_class2,fuel_type,fuel_barrels_year,city_mpg,highway_mpg,combined_mpg,co2_emission_grams_mile,fuel_cost_year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550


In [None]:
print(y)

Two options:
> 1. store it again on the variable `data`: 

    data = data.rename(columns={'Make':'Manufacturer', 'Year':'ANO'})
> 2. Use the inplace argument `inplace =  True` to change the values within the dataframe automatically

    data.rename(columns={'Make':'Manufacturer', 'Year':'ANO'}, inplace=True)
    

In [None]:
# You can also assign to a different variable, of course
renamed_data = data.rename(columns={'make':'Manufacturer', 'year3':'ANO'})

In [None]:
renamed_data.head(2)

In [None]:
data.head(2)

## Reordering columns in a dataframe

>    - Remember you always pass a list of columns to access a dataframe

Just select the columns in a different order and overwrite the previous dataframe

In [32]:
data[['make','model']]

KeyError: "['make'] not in index"

In [33]:
data[['model', 'make']]

KeyError: "['make'] not in index"

In [34]:
data.columns

Index(['manufacturer', 'model', 'model_year', 'engine_displacement2',
       'cylinders', 'transmission', 'drivetrain', 'vehicle_class2',
       'fuel_type', 'fuel_barrels_year', 'city_mpg', 'highway_mpg',
       'combined_mpg', 'co2_emission_grams_mile', 'fuel_cost_year'],
      dtype='object')

In [35]:
data = data[['fuel_cost_year', 'make', 'model', 'year3', 'engine_displacement2', 'cylinders',
       'transmission', 'drivetrain', 'vehicle_class2', 'fuel_type',
       'fuel_barrels_year', 'city_mpg', 'highway_mpg', 'combined_mpg',
       'co2_emission_grams_mile']]

KeyError: "['make', 'year3'] not in index"

In [36]:
data

Unnamed: 0,manufacturer,model,model_year,engine_displacement2,cylinders,transmission,drivetrain,vehicle_class2,fuel_type,fuel_barrels_year,city_mpg,highway_mpg,combined_mpg,co2_emission_grams_mile,fuel_cost_year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


In [None]:
data.loc[:, 'model','make'] # WRONG - not a list, you passed a string, string - not a list.

How can I get the `fuel cost/year` variable and put it at the beginning of the dataframe

In [37]:
data.columns

Index(['manufacturer', 'model', 'model_year', 'engine_displacement2',
       'cylinders', 'transmission', 'drivetrain', 'vehicle_class2',
       'fuel_type', 'fuel_barrels_year', 'city_mpg', 'highway_mpg',
       'combined_mpg', 'co2_emission_grams_mile', 'fuel_cost_year'],
      dtype='object')

In [38]:
column_order = ['co2_emission_grams_mile', 'fuel_cost_year', 'make', 'model', 'year3', 'engine_displacement2',
       'cylinders', 'transmission', 'drivetrain', 'vehicle_class2',
       'fuel_type', 'fuel_barrels_year', 'city_mpg', 'highway_mpg',
       'combined_mpg', ]

data = data.loc[:, column_order]

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported, see https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#deprecate-loc-reindex-listlike'

In [40]:
# problems you may handle

# auto-assign a subset of the dataframe
data = data['manufacturer']

In [41]:
data.head(2)

0    AM General
1    AM General
Name: manufacturer, dtype: object

In [49]:
data = pd.read_csv('vehicles.csv')

In [50]:
# assign an inplace=True command:
data.rename({'Year':'Model_Year'}, axis=1, inplace=True)

In [47]:
data.head(2)

AttributeError: 'NoneType' object has no attribute 'head'

In [51]:
print(data)

                   Make                Model  Model_Year  Engine Displacement  \
0            AM General    DJ Po Vehicle 2WD        1984                  2.5   
1            AM General     FJ8c Post Office        1984                  4.2   
2            AM General  Post Office DJ5 2WD        1985                  2.5   
3            AM General  Post Office DJ8 2WD        1985                  4.2   
4      ASC Incorporated                  GNX        1987                  3.8   
...                 ...                  ...         ...                  ...   
35947             smart         fortwo coupe        2013                  1.0   
35948             smart         fortwo coupe        2014                  1.0   
35949             smart         fortwo coupe        2015                  1.0   
35950             smart         fortwo coupe        2016                  0.9   
35951             smart         fortwo coupe        2016                  0.9   

       Cylinders     Transm

## Remove column (or row)

- The `.drop()` method
- By default, `.drop()` drops a row given its index.

In [53]:
data = pd.read_csv('vehicles.csv')

In [54]:
data.drop('Year')

KeyError: "['Year'] not found in axis"

In [None]:
data.drop('Year')

In [56]:
data.drop('Year', axis=1)

Unnamed: 0,Make,Model,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


In [57]:
data.drop(1)

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
5,Acura,2.2CL/3.0CL,1997,2.2,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,20,26,22,403.954545,1500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


In [58]:
data.drop(1).reset_index(drop=True)

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
2,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
3,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
4,Acura,2.2CL/3.0CL,1997,2.2,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,20,26,22,403.954545,1500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35946,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35947,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35948,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35949,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


# Filter records
>    - `mask` concept
>    - `.query()` method

This is really important for data wrangling.

In [None]:
data = pd.read_csv('data/vehicles.csv')

In [None]:
data.head(2)

## Simple Example: Starting with a numpy array. How can I filter the values of a list?

In [71]:
import numpy as np

In [72]:
my_array = np.array([1,2,3,4,5,6,7,8,9,10])

In [73]:
my_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [74]:
my_array * 10

array([ 10,  20,  30,  40,  50,  60,  70,  80,  90, 100])

In [75]:
my_array > 5

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

The results of `my_array > 5` is what is called **a mask**. A result containing the `True` and `False` results of an operation. 

In [76]:
my_array[5:]

array([ 6,  7,  8,  9, 10])

In [77]:
my_array[ [False, False, False, False, False,  True,  True,  True,  True, True] ]

array([ 6,  7,  8,  9, 10])

In [78]:
my_array[my_array < 8]

array([1, 2, 3, 4, 5, 6, 7])

Masks can be used as an index to select data!

In [79]:
my_array[ [False, False, False, False, False,  True,  True,  True,  True, True] ]

array([ 6,  7,  8,  9, 10])

In [80]:
my_array[ my_array > 5 ]

array([ 6,  7,  8,  9, 10])

After selecting, you can do anything with it, for example assigning it. This operation is called a `vectorial` operation. It is done all at once.

In [81]:
my_array[my_array > 5] = 1000

In [82]:
my_array

array([   1,    2,    3,    4,    5, 1000, 1000, 1000, 1000, 1000])

In [83]:
my_matrix = np.random.randint(0, 10, size=(5,5))
my_matrix

array([[9, 6, 6, 3, 5],
       [9, 4, 5, 6, 7],
       [6, 6, 0, 0, 1],
       [5, 5, 7, 0, 8],
       [2, 4, 8, 9, 3]])

In [84]:
my_matrix > 5

array([[ True,  True,  True, False, False],
       [ True, False, False,  True,  True],
       [ True,  True, False, False, False],
       [False, False,  True, False,  True],
       [False, False,  True,  True, False]])

In [85]:
my_matrix[ my_matrix > 5 ] = -99999

In [86]:
my_matrix

array([[-99999, -99999, -99999,      3,      5],
       [-99999,      4,      5, -99999, -99999],
       [-99999, -99999,      0,      0,      1],
       [     5,      5, -99999,      0, -99999],
       [     2,      4, -99999, -99999,      3]])

In [87]:
my_array[ my_array > 5 ] = 10

In [88]:
my_array

array([ 1,  2,  3,  4,  5, 10, 10, 10, 10, 10])

You can also save the condition

In [89]:
my_array = np.array([1,2,3,4,5,6,7,8,9,10])

In [90]:
my_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [91]:
condition = my_array > 5 
condition

array([False, False, False, False, False,  True,  True,  True,  True,
        True])

In [92]:
my_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [93]:
my_array[ condition ]

array([ 6,  7,  8,  9, 10])

## Bitwise logical operators - Combining conditions

To make more than one condition together, you can use 
- `&` - analogous to `and`
- `|` - analogous to `or` 

For example, get all numbers from my_array that are greater than 3 and smaller than 8

Let's do it in steps:
- get values greater than 3

In [94]:
my_array[my_array > 3]

array([ 4,  5,  6,  7,  8,  9, 10])

- get values smaller than 8

In [95]:
my_array[my_array < 8]

array([1, 2, 3, 4, 5, 6, 7])

- get values greater than 3 and smaller than 8

In [96]:
greater_than_3 = my_array > 3

In [97]:
smaller_than_8 = my_array < 8

In [98]:
(my_array > 3) & (my_array < 8)

array([False, False, False,  True,  True,  True,  True, False, False,
       False])

In [99]:
# (my_array > 3) or (my_array < 8)
(my_array > 3) | (my_array < 8)


array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])

In [100]:
(my_array > 3) & (my_array < 8)

array([False, False, False,  True,  True,  True,  True, False, False,
       False])

In [101]:
greater_than_3 & smaller_than_8

array([False, False, False,  True,  True,  True,  True, False, False,
       False])

## Now in a dataframe

Let's find the rows in which the Cylinders values are exactly 6.

In [102]:
data

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.437500,2550
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35947,smart,fortwo coupe,2013,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35948,smart,fortwo coupe,2014,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,243.000000,1100
35949,smart,fortwo coupe,2015,1.0,3.0,Auto(AM5),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,38,36,244.000000,1100
35950,smart,fortwo coupe,2016,0.9,3.0,Auto(AM6),Rear-Wheel Drive,Two Seaters,Premium,9.155833,34,39,36,246.000000,1100


In [107]:
data['Cylinders'] == 4

0         True
1        False
2         True
3        False
4        False
         ...  
35947    False
35948    False
35949    False
35950    False
35951    False
Name: Cylinders, Length: 35952, dtype: bool

In [108]:
data['Cylinders'] == 4

0         True
1        False
2         True
3        False
4        False
         ...  
35947    False
35948    False
35949    False
35950    False
35951    False
Name: Cylinders, Length: 35952, dtype: bool

In [109]:
data.loc[:, 'Cylinders']

0        4.0
1        6.0
2        4.0
3        6.0
4        6.0
        ... 
35947    3.0
35948    3.0
35949    3.0
35950    3.0
35951    3.0
Name: Cylinders, Length: 35952, dtype: float64

In [110]:
data.loc[data['Cylinders'] == 4, :]

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100
5,Acura,2.2CL/3.0CL,1997,2.2,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,20,26,22,403.954545,1500
6,Acura,2.2CL/3.0CL,1997,2.2,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.733750,22,28,24,370.291667,1400
8,Acura,2.3CL/3.0CL,1998,2.3,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,19,27,22,403.954545,1500
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35927,Yugo,GV Plus/GV/Cabrio,1990,1.3,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.184400,23,28,25,355.480000,1350
35928,Yugo,GV/GVX,1987,1.1,4.0,Manual 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,12.677308,24,29,26,341.807692,1300
35929,Yugo,GV/GVX,1989,1.1,4.0,Manual 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,12.677308,24,29,26,341.807692,1300
35930,Yugo,GV/GVX,1989,1.3,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.184400,23,28,25,355.480000,1350


### Example

In [111]:
# create a column with all zeroes named - 'fl_city_car'

data['fl_city_car'] = 0

In [112]:
(data['City MPG']) > (data['Highway MPG'])

0         True
1        False
2        False
3        False
4        False
         ...  
35947    False
35948    False
35949    False
35950    False
35951    False
Length: 35952, dtype: bool

In [None]:
# assign 1 to 'fl_city_car' all cars that have 'City MPG' > 'Highway MPG'

data.loc[(data['City MPG']) > (data['Highway MPG']), 'fl_city_car'] = 1

In [None]:
data.loc[(data['City MPG']) > (data['Highway MPG']), 'fl_city_car'] = 10

In [None]:
data.loc[(data['City MPG']) > (data['Highway MPG']), 'fl_city_car']

## You can combine conditions

Cars from `Ford` and 6 `Cylinders`

In [None]:
data.loc[:, :]

In [None]:
data['std'] = data[['City MPG','Highway MPG','Combined MPG']].std(axis=1)

In [None]:
data.loc[data['std'] != 0, :]

In [None]:
data.loc[(data['Cylinders'] == 6) & (data['Make'] == 'Ford'), :]

In [None]:
# careful with:

data.loc[data['Make']=='Ford' & data['Cylinders']==6, :] # WRONG!!

## You can put the conditions in variables as well

In [None]:
condition1 = (data['Make']=='Ford')
condition2 = (data['Cylinders']==6)
condition3 = (data['Combined MPG'] < 18)

In [None]:
data.loc[condition1 & condition2 & condition3, :]

## Another way to do the same thing.

* using the method `query`

The method `query` receives a string in which you can say your condition. Important things:
- `.query()` is a method of your dataframe
- `.query()` method receives a string 
- Every word inside the string that is not `quoted` is considered a variable of your dataframe (so, for example `.query('Year == 1999')` will look for the variable `Year`. Another example: if you try to run `.query('Make == Ford')` will look both for the column name `Make` and the column named `Ford`. If you want the results of the column `Make` to match the **string** Ford, you have to run `.query('Make == "Ford"')`
- If your column has spaces, you have to call it using backticks like in **.query('\`Engine Displacement\` < 4')**:

In [113]:
indexes = list(data.index)
indexes.insert(0, 'Make')

In [114]:
indexes.remove(0)

In [115]:
data.index = indexes

In [116]:
data.query('Make == "Ford"')

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year,fl_city_car
11440,Ford,Aerostar Van,1986,2.3,4.0,Automatic 4-spd,Rear-Wheel Drive,Vans,Regular,17.347895,18,22,19,467.736842,1750,0
11441,Ford,Aerostar Van,1986,2.3,4.0,Manual 5-spd,Rear-Wheel Drive,Vans,Regular,13.733750,23,26,24,370.291667,1400,0
11442,Ford,Aerostar Van,1986,2.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Vans,Regular,19.388824,15,21,17,522.764706,1950,0
11443,Ford,Aerostar Van,1986,2.8,6.0,Manual 5-spd,Rear-Wheel Drive,Vans,Regular,18.311667,16,22,18,493.722222,1850,0
11444,Ford,Aerostar Van,1986,3.0,6.0,Manual 5-spd,Rear-Wheel Drive,Vans,Regular,17.347895,17,22,19,467.736842,1750,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14381,Ford,Windstar FWD Wagon,2000,3.0,6.0,Automatic 4-spd,Front-Wheel Drive,Minivan - 2WD,Regular,19.388824,15,21,17,522.764706,1950,0
14382,Ford,Windstar FWD Wagon,2000,3.8,6.0,Automatic 4-spd,Front-Wheel Drive,Minivan - 2WD,Regular,19.388824,15,21,17,522.764706,1950,0
14383,Ford,Windstar FWD Wagon,2001,3.8,6.0,Automatic 4-spd,Front-Wheel Drive,Minivan - 2WD,Regular,18.311667,16,22,18,493.722222,1850,0
14384,Ford,Windstar FWD Wagon,2002,3.8,6.0,Automatic 4-spd,Front-Wheel Drive,Minivan - 2WD,Regular,18.311667,16,21,18,493.722222,1850,0


In [117]:
data.query('Cylinders == 4 and Make == "Ford"')

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year,fl_city_car
11440,Ford,Aerostar Van,1986,2.3,4.0,Automatic 4-spd,Rear-Wheel Drive,Vans,Regular,17.347895,18,22,19,467.736842,1750,0
11441,Ford,Aerostar Van,1986,2.3,4.0,Manual 5-spd,Rear-Wheel Drive,Vans,Regular,13.733750,23,26,24,370.291667,1400,0
11446,Ford,Aerostar Van,1987,2.3,4.0,Automatic 4-spd,Rear-Wheel Drive,Vans,Regular,16.480500,18,24,20,444.350000,1650,0
11447,Ford,Aerostar Van,1987,2.3,4.0,Manual 5-spd,Rear-Wheel Drive,Vans,Regular,13.733750,23,26,24,370.291667,1400,0
11477,Ford,Aerostar Wagon,1986,2.3,4.0,Manual 5-spd,Rear-Wheel Drive,Vans,Regular,14.982273,20,25,22,403.954545,1500,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14345,Ford,Transit Connect Wagon LWB FFV,2017,2.5,4.0,Automatic (S6),Front-Wheel Drive,Special Purpose Vehicle 2WD,Gasoline or E85,14.982273,19,27,22,407.000000,1500,0
14346,Ford,Transit Connect Wagon LWB FWD,2014,2.5,4.0,Automatic (S6),Front-Wheel Drive,Special Purpose Vehicle 2WD,Regular,14.330870,20,28,23,391.000000,1450,0
14347,Ford,Transit Connect Wagon LWB FWD,2015,2.5,4.0,Automatic (S6),Front-Wheel Drive,Special Purpose Vehicle 2WD,Regular,14.330870,20,28,23,391.000000,1450,0
14348,Ford,Transit Connect Wagon LWB FWD,2016,2.5,4.0,Automatic (S6),Front-Wheel Drive,Special Purpose Vehicle 2WD,Regular,14.982273,19,27,22,405.000000,1500,0


In [118]:
data.query('`City MPG` > `Highway MPG`')

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year,fl_city_car
Make,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950,0
47,Acura,ILX Hybrid,2013,1.5,4.0,Auto(AV-S7),Front-Wheel Drive,Compact Cars,Premium,8.673947,39,38,38,228.000000,1050,0
48,Acura,ILX Hybrid,2014,1.5,4.0,Auto(AV-S7),Front-Wheel Drive,Compact Cars,Premium,8.673947,39,38,38,228.000000,1050,0
3069,BMW,i3 REX,2014,0.6,2.0,Auto(A1),Rear-Wheel Drive,Subcompact Cars,Premium Gas or Electricity,1.563190,41,37,39,40.000000,1050,0
3070,BMW,i3 REX,2015,0.6,2.0,Automatic (A1),Rear-Wheel Drive,Subcompact Cars,Premium Gas or Electricity,1.563190,41,37,39,40.000000,1050,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33293,Toyota,Prius v,2015,1.8,4.0,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Station Wagons,Regular,7.847857,44,40,42,211.000000,800,0
33294,Toyota,Prius v,2016,1.8,4.0,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Station Wagons,Regular,7.847857,44,40,42,211.000000,800,0
33295,Toyota,Prius v,2017,1.8,4.0,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Station Wagons,Regular,8.039268,43,39,41,217.000000,800,0
33374,Toyota,RAV4 Hybrid AWD,2016,2.5,4.0,Auto(AV-S6),All-Wheel Drive,Small Sport Utility Vehicle 4WD,Regular,9.988182,34,31,33,270.000000,1000,0


In [119]:
data.query('Cylinders == 4')

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year,fl_city_car
Make,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950,0
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.437500,2100,0
5,Acura,2.2CL/3.0CL,1997,2.2,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,20,26,22,403.954545,1500,0
6,Acura,2.2CL/3.0CL,1997,2.2,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.733750,22,28,24,370.291667,1400,0
8,Acura,2.3CL/3.0CL,1998,2.3,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,19,27,22,403.954545,1500,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
35927,Yugo,GV Plus/GV/Cabrio,1990,1.3,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.184400,23,28,25,355.480000,1350,0
35928,Yugo,GV/GVX,1987,1.1,4.0,Manual 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,12.677308,24,29,26,341.807692,1300,0
35929,Yugo,GV/GVX,1989,1.1,4.0,Manual 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,12.677308,24,29,26,341.807692,1300,0
35930,Yugo,GV/GVX,1989,1.3,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.184400,23,28,25,355.480000,1350,0


In [120]:
numero_cilindros = 6
data.query(f'Make == "Acura" and Cylinders == {numero_cilindros}')

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year,fl_city_car
7,Acura,2.2CL/3.0CL,1997,3.0,6.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,16.480500,18,26,20,444.350000,1650,0
10,Acura,2.3CL/3.0CL,1998,3.0,6.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,16.480500,17,26,20,444.350000,1650,0
13,Acura,2.3CL/3.0CL,1999,3.0,6.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,16.480500,17,26,20,444.350000,1650,0
16,Acura,2.5TL/3.2TL,1996,3.2,6.0,Automatic 4-spd,Front-Wheel Drive,Compact Cars,Premium,17.347895,17,22,19,467.736842,2150,0
18,Acura,2.5TL/3.2TL,1997,3.2,6.0,Automatic 4-spd,Front-Wheel Drive,Compact Cars,Premium,17.347895,17,22,19,467.736842,2150,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
292,Acura,TSX,2014,3.5,6.0,Automatic (S5),Front-Wheel Drive,Compact Cars,Premium,14.330870,19,28,23,392.000000,1750,0
303,Acura,ZDX 4WD,2010,3.7,6.0,Automatic (S6),All-Wheel Drive,Sport Utility Vehicle - 4WD,Premium,17.347895,16,23,19,467.736842,2150,0
304,Acura,ZDX 4WD,2011,3.7,6.0,Automatic (S6),All-Wheel Drive,Sport Utility Vehicle - 4WD,Premium,17.347895,16,23,19,467.736842,2150,0
305,Acura,ZDX 4WD,2012,3.7,6.0,Automatic (S6),All-Wheel Drive,Sport Utility Vehicle - 4WD,Premium,17.347895,16,23,19,467.736842,2150,0


In [121]:
data.query('`City MPG` > `Highway MPG`')

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year,fl_city_car
Make,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950,0
47,Acura,ILX Hybrid,2013,1.5,4.0,Auto(AV-S7),Front-Wheel Drive,Compact Cars,Premium,8.673947,39,38,38,228.000000,1050,0
48,Acura,ILX Hybrid,2014,1.5,4.0,Auto(AV-S7),Front-Wheel Drive,Compact Cars,Premium,8.673947,39,38,38,228.000000,1050,0
3069,BMW,i3 REX,2014,0.6,2.0,Auto(A1),Rear-Wheel Drive,Subcompact Cars,Premium Gas or Electricity,1.563190,41,37,39,40.000000,1050,0
3070,BMW,i3 REX,2015,0.6,2.0,Automatic (A1),Rear-Wheel Drive,Subcompact Cars,Premium Gas or Electricity,1.563190,41,37,39,40.000000,1050,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33293,Toyota,Prius v,2015,1.8,4.0,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Station Wagons,Regular,7.847857,44,40,42,211.000000,800,0
33294,Toyota,Prius v,2016,1.8,4.0,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Station Wagons,Regular,7.847857,44,40,42,211.000000,800,0
33295,Toyota,Prius v,2017,1.8,4.0,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Station Wagons,Regular,8.039268,43,39,41,217.000000,800,0
33374,Toyota,RAV4 Hybrid AWD,2016,2.5,4.0,Auto(AV-S6),All-Wheel Drive,Small Sport Utility Vehicle 4WD,Regular,9.988182,34,31,33,270.000000,1000,0


In [122]:
numero_cilindros = 4
data.query(f'Make == "Acura" and Cylinders == {numero_cilindros}')

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year,fl_city_car
5,Acura,2.2CL/3.0CL,1997,2.2,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,20,26,22,403.954545,1500,0
6,Acura,2.2CL/3.0CL,1997,2.2,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.733750,22,28,24,370.291667,1400,0
8,Acura,2.3CL/3.0CL,1998,2.3,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,19,27,22,403.954545,1500,0
9,Acura,2.3CL/3.0CL,1998,2.3,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.733750,21,29,24,370.291667,1400,0
11,Acura,2.3CL/3.0CL,1999,2.3,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,20,27,22,403.954545,1500,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
291,Acura,TSX,2014,2.4,4.0,Manual 6-spd,Front-Wheel Drive,Compact Cars,Premium,13.733750,21,29,24,370.000000,1700,0
293,Acura,TSX Wagon,2011,2.4,4.0,Automatic (S5),Front-Wheel Drive,Small Station Wagons,Premium,13.184400,22,30,25,355.480000,1600,0
294,Acura,TSX Wagon,2012,2.4,4.0,Automatic (S5),Front-Wheel Drive,Small Station Wagons,Premium,13.184400,22,30,25,355.480000,1600,0
295,Acura,TSX Wagon,2013,2.4,4.0,Automatic (S5),Front-Wheel Drive,Small Station Wagons,Premium,13.184400,22,30,25,358.000000,1600,0
