# Using Pandas

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 200)
## to make it possible to display multiple output inside one cell 
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

<b>load the data from the vehicles.csv file into pandas data frame

In [2]:
## Your Code here
cars_df = pd.read_csv('data/vehicles.csv')

First exploration of the dataset:

- How many observations does it have?
- Look at all the columns: do you understand what they mean?
- Look at the raw data: do you see anything weird?
- Look at the data types: are they the expected ones for the information the column contains?

#### - How many observations does it have?


In [3]:
cars_df.shape[0]

35952

#### - Look at all the columns: do you understand what they mean?


In [4]:
cars_df.columns # attributes/features of a car

Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
       'CO2 Emission Grams/Mile', 'Fuel Cost/Year'],
      dtype='object')

#### - Look at the raw data: do you see anything weird?

In [5]:
cars_df.head(3)

Unnamed: 0,Make,Model,Year,Engine Displacement,Cylinders,Transmission,Drivetrain,Vehicle Class,Fuel Type,Fuel Barrels/Year,City MPG,Highway MPG,Combined MPG,CO2 Emission Grams/Mile,Fuel Cost/Year
0,AM General,DJ Po Vehicle 2WD,1984,2.5,4.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,19.388824,18,17,17,522.764706,1950
1,AM General,FJ8c Post Office,1984,4.2,6.0,Automatic 3-spd,2-Wheel Drive,Special Purpose Vehicle 2WD,Regular,25.354615,13,13,13,683.615385,2550
2,AM General,Post Office DJ5 2WD,1985,2.5,4.0,Automatic 3-spd,Rear-Wheel Drive,Special Purpose Vehicle 2WD,Regular,20.600625,16,17,16,555.4375,2100


#### - Look at the data types: are they the expected ones for the information the column contains?

In [6]:
cars_df.columns


Index(['Make', 'Model', 'Year', 'Engine Displacement', 'Cylinders',
       'Transmission', 'Drivetrain', 'Vehicle Class', 'Fuel Type',
       'Fuel Barrels/Year', 'City MPG', 'Highway MPG', 'Combined MPG',
       'CO2 Emission Grams/Mile', 'Fuel Cost/Year'],
      dtype='object')

### Cleaning and wrangling data

- Some car brand names refer to the same brand. Replace all brand names that contain the word "Dutton" for simply "Dutton". If you find similar examples, clean their names too. Use `loc` with boolean indexing.

- Convert CO2 Emissions from Grams/Mile to Grams/Km

- Create a binary column that solely indicates if the transmission of a car is automatic or manual. Use `pandas.Series.str.startswith` and .

- convert MPG columns to km_per_liter

Note:
<br>Converting Grams/Mile to Grams/Km

1 Mile = 1.60934 Km

Converting Gallons to Liters

1 Gallon = 3.78541 Liters



#### Replace all brand names that contain the word "Dutton" for simply "Dutton".

In [7]:

cars_df.loc[cars_df['Make'].str.contains('Dutton'), 'Make'] = 'Dutton'


#### Convert CO2 Emissions from Grams/Mile to Grams/Km

In [8]:
cars_df['CO2 Emission Grams/Km'] = cars_df['CO2 Emission Grams/Mile']/1.60934

#### Create a binary column that solely indicates if the transmission of a car is automatic or manual

In [9]:
cars_df['binary'] = cars_df.loc[cars_df['Transmission'].str.contains('Manual'), 'binary'] = 'Manual' # str.contains to mask the rows that contain 'ball' and then overwrite with the new value:
cars_df['binary'] = cars_df.loc[cars_df['Transmission'].str.contains('Auto'), 'binary'] = 'Automatic'


In [10]:
cars_df['binary'].value_counts()

Automatic    35952
Name: binary, dtype: int64

#### Convert MPG columns to km_per_liter

In [11]:
cars_df['City MPG'] = cars_df['City MPG']*1.60934/3.78541
cars_df['Highway MPG'] = cars_df['Highway MPG']*1.60934/3.78541
cars_df['Combined MPG'] = cars_df['Combined MPG']*1.60934/3.78541


### Gathering insights:

- How many car makers are there? How many models? Which car maker has the most cars in the dataset?

- When were these cars made? How big is the engine of these cars?

- What's the frequency of different transmissions, drivetrains and fuel types?

- What's the car that consumes the least/most fuel?

#### - How many car makers are there? How many models? Which car maker has the most cars in the dataset?


In [18]:
makers = cars_df.Make.nunique()
print("Total car makers are:", makers)

brands = cars_df.Model.nunique()
print("Total car models are:", brands)

most_cars = cars_df['Make'].mode()
print("Maker with most cars are:", most_cars[0])


Total car makers are: 125
Total car models are: 3608
Maker with most cars are: Chevrolet


#### - When were these cars made? How big is the engine of these cars?


In [24]:
print(f"Cars were built between {cars_df['Year'].min()} and {cars_df['Year'].max()}")

print(f"Engines range between {cars_df['Engine Displacement'].min()} and {cars_df['Engine Displacement'].max()}")



Cars were built between 1984 and 2017
Engines range between 0.6 and 8.4


#### - What's the frequency of different transmissions, drivetrains and fuel types?


In [27]:
print(f"Frequency of different transmissions are: {cars_df['Transmission'].value_counts()}")

print(f"Frequency of different drivetrains are: {cars_df['Drivetrain'].value_counts()}")

print(f"Frequency of different fuel type are: {cars_df['Fuel Type'].value_counts()}")

Frequency of different transmissions are: Automatic 4-spd                     10585
Manual 5-spd                         7787
Automatic (S6)                       2631
Automatic 3-spd                      2597
Manual 6-spd                         2423
Automatic 5-spd                      2171
Automatic 6-spd                      1432
Manual 4-spd                         1306
Automatic (S8)                        960
Automatic (S5)                        822
Automatic (variable gear ratios)      675
Automatic 7-spd                       662
Automatic (S7)                        261
Auto(AM-S7)                           256
Automatic 8-spd                       243
Automatic (S4)                        229
Auto(AM7)                             157
Auto(AV-S6)                           145
Auto(AM6)                             110
Auto(AM-S6)                            92
Automatic 9-spd                        90
Manual 3-spd                           74
Manual 7-spd                      

#### - What's the car that consumes the least/most fuel?

In [36]:


print(f"Car that consumes the most is {cars_df.loc[cars_df['Fuel Barrels/Year'].idxmax()][0]}")
print(f"Car that consumes the most is {cars_df.loc[cars_df['Fuel Barrels/Year'].idxmin()][0]}")


Car that consumes the most is Lamborghini
Car that consumes the most is Honda


<b> (Optional)

What brand has the worse CO2 Emissions on average?

Hint: use the function `sort_values()`

In [None]:
## your Code here


Do cars with automatic transmission consume more fuel than cars with manual transmission on average?

In [None]:
## Your Code is here 
