# Using Pandas

In [97]:
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 200)
## to make it possible to display multiple output inside one cell 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

<b>load the data from the vehicles.csv file into pandas data frame

In [98]:
## Your Code here
df = pd.read_csv("data/vehicles.csv")
print(df)
df.head(9)

                   Make                Model  Year  Engine Displacement  \
0            AM General    DJ Po Vehicle 2WD  1984                  2.5   
1            AM General     FJ8c Post Office  1984                  4.2   
2            AM General  Post Office DJ5 2WD  1985                  2.5   
3            AM General  Post Office DJ8 2WD  1985                  4.2   
4      ASC Incorporated                  GNX  1987                  3.8   
...                 ...                  ...   ...                  ...   
35947             smart         fortwo coupe  2013                  1.0   
35948             smart         fortwo coupe  2014                  1.0   
35949             smart         fortwo coupe  2015                  1.0   
35950             smart         fortwo coupe  2016                  0.9   
35951             smart         fortwo coupe  2016                  0.9   

       Cylinders     Transmission        Drivetrain  \
0            4.0  Automatic 3-spd     2-Whee

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,14,21,16,555.4375,2550
5,Acura,2.2CL/3.0CL,1997,2.2,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,20,26,22,403.954545,1500
6,Acura,2.2CL/3.0CL,1997,2.2,4.0,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.73375,22,28,24,370.291667,1400
7,Acura,2.2CL/3.0CL,1997,3.0,6.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,16.4805,18,26,20,444.35,1650
8,Acura,2.3CL/3.0CL,1998,2.3,4.0,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,19,27,22,403.954545,1500


First exploration of the dataset:

- How many observations does it have?
- Look at all the columns: do you understand what they mean?
- Look at the raw data: do you see anything weird?
- Look at the data types: are they the expected ones for the information the column contains?

In [99]:
## how many observations?
#you reversed the Combined MPG
obs = 35952 * 15
print(df.columns)
df["Highway MPG"].describe()

Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
       'CO2 Emission Grams/Mile', 'Fuel Cost/Year'],
      dtype='object')


count    35952.000000
mean        23.880646
std          5.890876
min          9.000000
25%         20.000000
50%         24.000000
75%         27.000000
max         61.000000
Name: Highway MPG, dtype: float64

### Cleaning and wrangling data

- Some car brand names refer to the same brand. Replace all brand names that contain the word "Dutton" for simply "Dutton". If you find similar examples, clean their names too. Use `loc` with boolean indexing.

- Convert CO2 Emissions from Grams/Mile to Grams/Km

- Create a binary column that solely indicates if the transmission of a car is automatic or manual. Use `pandas.Series.str.startswith` and .

- convert MPG columns to km_per_liter

Note:
<br>Converting Grams/Mile to Grams/Km

1 Mile = 1.60934 Km

Converting Gallons to Liters

1 Gallon = 3.78541 Liters



In [100]:
## Your Code here

df["CO2 Emission Grams/Mile"] = df["CO2 Emission Grams/Mile"] / 1.60934
df = df.rename(columns={"CO2 Emission Grams/Mile": "CO2 Emission Grams/Km"})
df[["City MPG", "Combined MPG", "Highway MPG"]] *= (1.60934/3.78541)
df = df.rename(columns={"City MPG": "City Km per L", "Combined MPG": "Combined Km per L", "Highway MPG": "Highway Km per L"})


In [101]:
index_Dutton = df["Make"].str.contains("Dutton")
df.loc[index_Dutton, "Make"] = "Dutton"
df["Make"].unique()

array(['AM General', 'ASC Incorporated', 'Acura', 'Alfa Romeo',
       'American Motors Corporation', 'Aston Martin', 'Audi',
       'Aurora Cars Ltd', 'Autokraft Limited', 'BMW', 'BMW Alpina',
       'Bentley', 'Bertone', 'Bill Dovell Motor Car Company',
       'Bitter Gmbh and Co. Kg', 'Bugatti', 'Buick', 'CCC Engineering',
       'CX Automotive', 'Cadillac', 'Chevrolet', 'Chrysler',
       'Consulier Industries Inc', 'Dabryan Coach Builders Inc', 'Dacia',
       'Daewoo', 'Daihatsu', 'Dodge', 'Dutton', 'Eagle',
       'Environmental Rsch and Devp Corp', 'Evans Automobiles',
       'Excalibur Autos', 'Federal Coach', 'Ferrari', 'Fiat', 'Fisker',
       'Ford', 'GMC', 'General Motors', 'Genesis', 'Geo', 'Goldacre',
       'Grumman Allied Industries', 'Grumman Olson', 'Honda', 'Hummer',
       'Hyundai', 'Import Foreign Auto Sales Inc',
       'Import Trade Services', 'Infiniti', 'Isis Imports Ltd', 'Isuzu',
       'J.K. Motors', 'JBA Motorcars, Inc.', 'Jaguar', 'Jeep', 'Kia',
       '

In [102]:
df.insert(5, "Transmission Manual", df["Transmission"].str.contains("Man"))


In [107]:
df.head(100)

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission Manual,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City Km per L,Highway Km per L,Combined Km per L,CO2 Emission Grams/Km,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,False,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,7.652571,7.227428,7.227428,324.831736,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,False,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,5.526857,5.526857,5.526857,424.779962,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,False,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,6.802286,7.227428,6.802286,345.133719,2100
3,AM General,Post Office DJ8 2WD,1985,4.2,6.0,False,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,5.526857,5.526857,5.526857,424.779962,2550
4,ASC Incorporated,GNX,1987,3.8,6.0,False,Automatic 4-spd,Rear-Wheel Drive,Midsize Cars,Premium,20.600625,5.952,8.928,6.802286,345.133719,2550
5,Acura,2.2CL/3.0CL,1997,2.2,4.0,False,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,8.502857,11.053714,9.353143,251.006341,1500
6,Acura,2.2CL/3.0CL,1997,2.2,4.0,True,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.73375,9.353143,11.904,10.203428,230.089146,1400
7,Acura,2.2CL/3.0CL,1997,3.0,6.0,False,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,16.4805,7.652571,11.053714,8.502857,276.106976,1650
8,Acura,2.3CL/3.0CL,1998,2.3,4.0,False,Automatic 4-spd,Front-Wheel Drive,Subcompact Cars,Regular,14.982273,8.077714,11.478857,9.353143,251.006341,1500
9,Acura,2.3CL/3.0CL,1998,2.3,4.0,True,Manual 5-spd,Front-Wheel Drive,Subcompact Cars,Regular,13.73375,8.928,12.329143,10.203428,230.089146,1400


### Gathering insights:

- How many car makers are there? How many models? Which car maker has the most cars in the dataset?

- When were these cars made? How big is the engine of these cars?

- What's the frequency of different transmissions, drivetrains and fuel types?

- What's the car that consumes the least/most fuel?

In [115]:
# Your Code here
car_makers = len(df["Make"].unique())
car_makers
models = len(df["Model"].unique())
models
maker_with_most_cars = df["Make"].value_counts()
maker_with_most_cars

125

3608

Chevrolet                           3643
Ford                                2946
Dodge                               2360
GMC                                 2347
Toyota                              1836
BMW                                 1677
Mercedes-Benz                       1284
Nissan                              1253
Volkswagen                          1047
Mitsubishi                           950
Mazda                                915
Audi                                 890
Porsche                              862
Honda                                836
Jeep                                 829
Pontiac                              784
Subaru                               781
Volvo                                717
Hyundai                              662
Chrysler                             641
Buick                                537
Mercury                              532
Suzuki                               512
Cadillac                             508
Kia             

In [118]:
chevrolet = df["Make"].str.contains("Chevrolet")
df.loc[chevrolet, "Year"]
df.loc[chevrolet, "Engine Displacement"]

4275    1985
4276    1985
4277    1985
4278    1985
4279    1985
        ... 
7913    2013
7914    2014
7915    2015
7916    2016
7917    2017
Name: Year, Length: 3643, dtype: int64

4275    2.5
4276    4.3
4277    4.3
4278    4.3
4279    2.5
       ... 
7913    1.4
7914    1.4
7915    1.4
7916    1.5
7917    1.5
Name: Engine Displacement, Length: 3643, dtype: float64

In [116]:
#df["Combined Km per L"].max()
least_fuel = df.loc[df['Combined Km per L'] == df["Combined Km per L"].max()]
least_fuel
most_fuel = df.loc[df['Combined Km per L'] == df["Combined Km per L"].min()]
most_fuel

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission Manual,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City Km per L,Highway Km per L,Combined Km per L,CO2 Emission Grams/Km,Fuel Cost/Year
33279,Toyota,Prius Eco,2016,1.8,4.0,False,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Cars,Regular,5.885893,24.658285,22.532571,23.808,98.176892,600
33280,Toyota,Prius Eco,2017,1.8,4.0,False,Automatic (variable gear ratios),Front-Wheel Drive,Midsize Cars,Regular,5.885893,24.658285,22.532571,23.808,98.176892,600


Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission Manual,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City Km per L,Highway Km per L,Combined Km per L,CO2 Emission Grams/Km,Fuel Cost/Year
20894,Lamborghini,Countach,1986,5.2,12.0,True,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,2.550857,4.251429,2.976,788.877073,5800
20895,Lamborghini,Countach,1987,5.2,12.0,True,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,2.550857,4.251429,2.976,788.877073,5800
20896,Lamborghini,Countach,1988,5.2,12.0,True,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,2.550857,4.251429,2.976,788.877073,5800
20897,Lamborghini,Countach,1989,5.2,12.0,True,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,2.550857,4.251429,2.976,788.877073,5800
20898,Lamborghini,Countach,1990,5.2,12.0,True,Manual 5-spd,Rear-Wheel Drive,Two Seaters,Premium,47.087143,2.550857,4.251429,2.976,788.877073,5800


In [122]:
df[["Transmission", "Fuel Type", "Drivetrain"]].value_counts()

Transmission     Fuel Type       Drivetrain                
Automatic 4-spd  Regular         Rear-Wheel Drive              3253
                                 Front-Wheel Drive             3150
Manual 5-spd     Regular         Front-Wheel Drive             3049
Automatic 4-spd  Regular         4-Wheel or All-Wheel Drive    1872
Manual 5-spd     Regular         Rear-Wheel Drive              1715
                                                               ... 
Automatic (AM6)  Premium         All-Wheel Drive                  1
Auto (AV-S6)     Premium         Rear-Wheel Drive                 1
Automatic 4-spd  Premium         2-Wheel Drive                    1
Manual 4-spd     Regular         4-Wheel Drive                    1
Automatic 7-spd  Premium or E85  4-Wheel or All-Wheel Drive       1
Length: 324, dtype: int64

<b> (Optional)

What brand has the worse CO2 Emissions on average?

Hint: use the function `sort_values()`

In [104]:
## your Code here


Do cars with automatic transmission consume more fuel than cars with manual transmission on average?

In [105]:
## Your Code is here 
