# Classification Problem (Cylinders)

Make a copy of the data before mutating

Let's take a closer look at the data

In [1]:
data <- mtcars
?mtcars

0,1
mtcars {datasets},R Documentation

0,1,2
"[, 1]",mpg,Miles/(US) gallon
"[, 2]",cyl,Number of cylinders
"[, 3]",disp,Displacement (cu.in.)
"[, 4]",hp,Gross horsepower
"[, 5]",drat,Rear axle ratio
"[, 6]",wt,Weight (1000 lbs)
"[, 7]",qsec,1/4 mile time
"[, 8]",vs,V/S
"[, 9]",am,"Transmission (0 = automatic, 1 = manual)"
"[,10]",gear,Number of forward gears


### Make factors
We want to use factors when possible (discrete values)

In [2]:
data$vs <- factor(as.integer(data$vs), levels = c(0, 1), labels = c('straight', 'v'))
data$am <- factor(as.integer(data$am), levels = c(0, 1), labels = c('automatic', 'manual'))
data$gear <- factor(as.integer(data$gear), levels = 3:5, labels = c('three','four','five'))
data$carb <- factor(as.integer(data$carb), levels = 1:8, labels = c('one', 'two', 'three','four','five','six','seven','eight'))

data$cyl <- factor(as.integer(data$cyl), levels = c(4,6,8), labels = c('Four Cyl.', 'Six Cyl.', 'Eight Cyl.'))
summary(data$cyl)

### Split the test and train data

In [3]:
library(caret)
set.seed(1024)
index.train <- createDataPartition(data$cyl, p = 0.7, list = FALSE)
train = data[index.train,]
test  = data[-(index.train), ]

Loading required package: lattice
Loading required package: ggplot2


### Train

In [4]:
tg <- expand.grid(.decay = 30:50/40)

model <- train(cyl ~ .,
               data = train,
               method = "multinom",
               tuneGrid = tg)

Loading required package: nnet


# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 5.147944
iter  20 value 1.715915
iter  30 value 1.469172
iter  40 value 1.434773
iter  50 value 1.428635
final  value 1.428635 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 5.182126
iter  20 value 1.741123
iter  30 value 1.495813
iter  40 value 1.462491
iter  50 value 1.456875
final  value 1.456875 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 5.215354
iter  20 value 1.766533
iter  30 value 1.522043
iter  40 value 1.489666
iter  50 value 1.484661
final  value 1.484661 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 5.247672
iter  20 value 1.791990
iter  30 value 1.547866
iter  40 value 1.516235
iter  50 value 1.512013
iter  50 value 1.512012
iter  50 value 1.512012
final  value 1.512012 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 5.279119
iter  20 value 1.817478
iter  30 value 1.

### View Model

In [5]:
model
summary(model)

Penalized Multinomial Regression 

23 samples
10 predictors
 3 classes: 'Four Cyl.', 'Six Cyl.', 'Eight Cyl.' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 23, 23, 23, 23, 23, 23, ... 
Resampling results across tuning parameters:

  decay  Accuracy   Kappa    
  0.750  0.8825974  0.8070509
  0.775  0.8825974  0.8070509
  0.800  0.8825974  0.8070509
  0.825  0.8825974  0.8070509
  0.850  0.8825974  0.8070509
  0.875  0.8825974  0.8070509
  0.900  0.8825974  0.8070509
  0.925  0.8825974  0.8070509
  0.950  0.8825974  0.8070509
  0.975  0.8825974  0.8070509
  1.000  0.8825974  0.8070509
  1.025  0.8825974  0.8070509
  1.050  0.8825974  0.8070509
  1.075  0.8825974  0.8070509
  1.100  0.8825974  0.8070509
  1.125  0.8775974  0.7990509
  1.150  0.8775974  0.7990509
  1.175  0.8775974  0.7990509
  1.200  0.8775974  0.7990509
  1.225  0.8775974  0.7990509
  1.250  0.8775974  0.7990509

Accuracy was used to select the optimal model using  the largest value.
T

In sqrt(diag(vc)): NaNs produced

Call:
multinom(formula = .outcome ~ ., data = dat, decay = param$decay)

Coefficients:
           (Intercept)        mpg       disp         hp          drat
Six Cyl.   -0.00130478 -0.3550865 0.06488467 0.03569670  0.0000851133
Eight Cyl. -0.02804028 -0.6472887 0.09411661 0.07280423 -0.1112152025
                    wt       qsec         vsv    ammanual    gearfour
Six Cyl.    0.01863915 -0.2964708 -0.02689415  0.06103815  0.07488444
Eight Cyl. -0.07225172 -0.6333116 -0.07087587 -0.01321111 -0.04341249
                gearfive     carbtwo   carbthree    carbfour carbfive carbsix
Six Cyl.   -2.018099e-02 -0.10630494 -0.03369311  0.15765896        0       0
Eight Cyl.  2.194335e-05  0.02224372  0.03369488 -0.04328997        0       0
           carbseven     carbeight
Six Cyl.           0 -6.870608e-06
Eight Cyl.         0  7.191551e-06

Std. Errors:
           (Intercept)      mpg      disp        hp      drat       wt     qsec
Six Cyl.      1.518649 1.622676 0.2527733 0.2326170 11.24150

### Predict

In [6]:
test$predicted <- predict(model, test)

### View Results

In [7]:
test[, c('cyl', 'predicted')]

Unnamed: 0,cyl,predicted
Hornet 4 Drive,Six Cyl.,Six Cyl.
Duster 360,Eight Cyl.,Eight Cyl.
Merc 450SLC,Eight Cyl.,Eight Cyl.
Cadillac Fleetwood,Eight Cyl.,Eight Cyl.
Chrysler Imperial,Eight Cyl.,Eight Cyl.
Toyota Corolla,Four Cyl.,Four Cyl.
Fiat X1-9,Four Cyl.,Four Cyl.
Ferrari Dino,Six Cyl.,Six Cyl.
Volvo 142E,Four Cyl.,Four Cyl.


### Confusion Matrix

In [8]:
confusionMatrix(test$predicted, test$cyl)

Confusion Matrix and Statistics

            Reference
Prediction   Four Cyl. Six Cyl. Eight Cyl.
  Four Cyl.          3        0          0
  Six Cyl.           0        2          0
  Eight Cyl.         0        0          4

Overall Statistics
                                     
               Accuracy : 1          
                 95% CI : (0.6637, 1)
    No Information Rate : 0.4444     
    P-Value [Acc > NIR] : 0.0006766  
                                     
                  Kappa : 1          
 Mcnemar's Test P-Value : NA         

Statistics by Class:

                     Class: Four Cyl. Class: Six Cyl. Class: Eight Cyl.
Sensitivity                    1.0000          1.0000            1.0000
Specificity                    1.0000          1.0000            1.0000
Pos Pred Value                 1.0000          1.0000            1.0000
Neg Pred Value                 1.0000          1.0000            1.0000
Prevalence                     0.3333          0.2222            0