## Scikit-Learn Machine Learning with car data

Author: tygithub18

Date: 12/13/2018

So, a lot of people have been wondering, what is a simple example that uses Machine Learning/Artificial Intelligence (ML/AI) and is free? 

Here is a basic machine learning model I created with Python using the library Scikit-Learn. I asked a small group of volunteers how they value certain attributes on a scale of 1-10. 1 being the lowest and 10 being the highest. Then they choose what type of vehicle it would be between a Sedan, SUV, or Truck. Using this data, I could train a basic K nearest neighbor model. Lastly, I asked some new volunteers what their values were and I could test to see if my model's prediction was correct. The result can be used as a prediction evaluation tool or as a recommendation system.

In [1]:
#First import pandas so that we can see the data in a dataframe view.
import pandas as pd

In [2]:
#Read the CSV file
#The Index column will be the first column
mydata= pd.read_csv('Car_Data.csv', index_col=0)

In [3]:
#Preview the data. This is a small dataset, so I can show all of it.
mydata

Unnamed: 0_level_0,MPG,Power,Storage_Capacity,Towing,Type
Record_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,9,6,6,8,SUV
2,9,7,6,5,Sedan
3,10,5,7,1,Sedan
4,5,1,6,1,Sedan
5,7,8,9,2,Sedan
6,9,6,10,8,Truck
7,7,7,7,3,Sedan
8,6,8,8,5,SUV
9,7,5,8,3,Truck
10,8,7,7,1,SUV


In [4]:
#Assign the X values
X = mydata[['MPG', 'Power', 'Storage_Capacity', 'Towing']]

In [5]:
#Preview our data
#Now we have our X data ready to Fit into the model
X.head()

Unnamed: 0_level_0,MPG,Power,Storage_Capacity,Towing
Record_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,9,6,6,8
2,9,7,6,5
3,10,5,7,1
4,5,1,6,1
5,7,8,9,2


In [6]:
#Check the shape of our X data
#Check the data type of our X data
print (type(X))
print (X.shape)

<class 'pandas.core.frame.DataFrame'>
(10, 4)


In [7]:
#Assign the y values
#Alt way is: y = mydata['Type'] #This does a dataframe
y = mydata.Type#This does a numpy array

In [8]:
#Preview our data
#Now we have our y data ready to Fit into the model
y.head()

Record_ID
1      SUV
2    Sedan
3    Sedan
4    Sedan
5    Sedan
Name: Type, dtype: object

In [9]:
#Check the shape of our X data
#Check the data type of our X data
print (type(y))
print (y.shape)

<class 'pandas.core.series.Series'>
(10,)


First lets try the Iris way. KNN

In [10]:
#Step 1: Import the class you plan to use. We are using nearest neighbor here.
from sklearn.neighbors import KNeighborsClassifier

In [11]:
#Step 2: "Instantiate" the "estimator"
#"Estimator" is scikit-learn's term for model
#"Instantiate" means "make an instance of"

knn = KNeighborsClassifier(n_neighbors=1)

In [12]:
print (knn)
#You can see all of the default parameters

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=1, p=2,
           weights='uniform')


In [13]:
#Step 3: Fit the model with data (aka "model training")
#Model is learing the relationship between X and y
#Occurs in-place. So you don't need to assign the results to a new object.
knn.fit(X,y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=1, p=2,
           weights='uniform')

These are people who are out-of-sample/people who were not in the training data, but instead the test data.

In [14]:
#Here is an example of one Out-of-Sample observation we can test
knn.predict([[3,5,8,6]])

array(['SUV'], dtype=object)

In [15]:
#You can also predict for mulitple observations at once
X_new = [[9,6,6,8], [9,7,6,5],[4,6,10,10]]
#I copy/paste Record 1, Record 2, and then put my own numbers

#Run multiple ones:
knn.predict(X_new)

array(['SUV', 'Sedan', 'Truck'], dtype=object)

Here are some more people:

In [16]:
#B
knn.predict([[9,4,5,1]])

array(['Sedan'], dtype=object)

In [17]:
#J
knn.predict([[8,9,2,1]])

array(['SUV'], dtype=object)

In [18]:
#M
knn.predict([[10,5,6,1]])

array(['Sedan'], dtype=object)