# Classification Problem (Cylinders)

Make a copy of the data before mutating

Let's take a closer look at the data

In [2]:
data <- mtcars
?mtcars

0,1
mtcars {datasets},R Documentation

0,1,2
"[, 1]",mpg,Miles/(US) gallon
"[, 2]",cyl,Number of cylinders
"[, 3]",disp,Displacement (cu.in.)
"[, 4]",hp,Gross horsepower
"[, 5]",drat,Rear axle ratio
"[, 6]",wt,Weight (1000 lbs)
"[, 7]",qsec,1/4 mile time
"[, 8]",vs,V/S
"[, 9]",am,"Transmission (0 = automatic, 1 = manual)"
"[,10]",gear,Number of forward gears


### Make factors
We want to use factors when possible (discrete values)

In [3]:
data$vs <- factor(as.integer(data$vs), levels = c(0, 1), labels = c('straight', 'v'))
data$am <- factor(as.integer(data$am), levels = c(0, 1), labels = c('automatic', 'manual'))
data$gear <- factor(as.integer(data$gear), levels = 3:5, labels = c('three','four','five'))
data$carb <- factor(as.integer(data$carb), levels = 1:8, labels = c('one', 'two', 'three','four','five','six','seven','eight'))

data$cyl <- factor(as.integer(data$cyl), levels = c(4,6,8), labels = c('Four Cyl.', 'Six Cyl.', 'Eight Cyl.'))
summary(data$cyl)

### Split the test and train data
This time no set.seed() call, so random. Rerun notebook to see different results.

In [4]:
library(caret)
index.train <- createDataPartition(data$cyl, p = 0.7, list = FALSE)
train = data[index.train,]
test  = data[-(index.train), ]

Loading required package: lattice
Loading required package: ggplot2


### Train

In [5]:
tg <- data.frame(.decay = 30:50/40)

model <- train(cyl ~ .,
               data = train,
               method = "multinom",
               tuneGrid = tg)

Loading required package: nnet


# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 4.303786
iter  20 value 1.817357
iter  30 value 1.696879
iter  40 value 1.693368
final  value 1.693360 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 4.331354
iter  20 value 1.851435
iter  30 value 1.730369
iter  40 value 1.727105
final  value 1.727099 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 4.357846
iter  20 value 1.884702
iter  30 value 1.763265
iter  40 value 1.760305
final  value 1.760300 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 4.383336
iter  20 value 1.917420
iter  30 value 1.795561
iter  40 value 1.792990
final  value 1.792987 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 4.407890
iter  20 value 1.950740
iter  30 value 1.827611
iter  40 value 1.825181
final  value 1.825179 
converged
# weights:  57 (36 variable)
initial  value 25.268083 
iter  10 value 4.431570
i

### View Model

In [6]:
model
summary(model)

Penalized Multinomial Regression 

23 samples
10 predictors
 3 classes: 'Four Cyl.', 'Six Cyl.', 'Eight Cyl.' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 23, 23, 23, 23, 23, 23, ... 
Resampling results across tuning parameters:

  decay  Accuracy   Kappa    
  0.750  0.9200317  0.8670166
  0.775  0.9280317  0.8851984
  0.800  0.9280317  0.8851984
  0.825  0.9280317  0.8851984
  0.850  0.9280317  0.8851984
  0.875  0.9280317  0.8851984
  0.900  0.9280317  0.8851984
  0.925  0.9280317  0.8851984
  0.950  0.9280317  0.8851984
  0.975  0.9280317  0.8851984
  1.000  0.9280317  0.8851984
  1.025  0.9280317  0.8851984
  1.050  0.9223175  0.8786984
  1.075  0.9223175  0.8786984
  1.100  0.9223175  0.8786984
  1.125  0.9223175  0.8786984
  1.150  0.9223175  0.8786984
  1.175  0.9223175  0.8786984
  1.200  0.9223175  0.8786984
  1.225  0.9223175  0.8786984
  1.250  0.9223175  0.8786984

Accuracy was used to select the optimal model using  the largest value.
T

In sqrt(diag(vc)): NaNs produced

Call:
multinom(formula = .outcome ~ ., data = dat, decay = param$decay)

Coefficients:
            (Intercept)        mpg       disp         hp        drat
Six Cyl.   -0.006614684 -0.3095601 0.06918312 0.02816928 -0.05247101
Eight Cyl. -0.027422022 -0.7672511 0.10368023 0.05962985 -0.09634168
                    wt       qsec         vsv    ammanual     gearfour
Six Cyl.   -0.00943847 -0.3470927 -0.03498673  0.03234385  0.003269168
Eight Cyl. -0.04587240 -0.5974356 -0.07401557 -0.02776589 -0.008559361
              gearfive     carbtwo   carbthree     carbfour carbfive
Six Cyl.    0.02275405 -0.14312036 -0.05240263  0.118094603        0
Eight Cyl. -0.02584514  0.02176331  0.05240242 -0.008354062        0
               carbsix carbseven     carbeight
Six Cyl.    0.04147918         0 -2.223699e-05
Eight Cyl. -0.02592667         0  2.282958e-05

Std. Errors:
           (Intercept)      mpg      disp        hp      drat        wt
Six Cyl.      3.842492 1.774055 0.1816180 0.1685894  9.5969

### Predict

In [7]:
test$predicted <- predict(model, test)

### View Results

In [8]:
test[, c('cyl', 'predicted')]

Unnamed: 0,cyl,predicted
Mazda RX4,Six Cyl.,Six Cyl.
Merc 280C,Six Cyl.,Six Cyl.
Merc 450SLC,Eight Cyl.,Eight Cyl.
Cadillac Fleetwood,Eight Cyl.,Eight Cyl.
Lincoln Continental,Eight Cyl.,Eight Cyl.
Chrysler Imperial,Eight Cyl.,Eight Cyl.
Honda Civic,Four Cyl.,Four Cyl.
Toyota Corolla,Four Cyl.,Four Cyl.
Fiat X1-9,Four Cyl.,Four Cyl.


### Confusion Matrix

In [9]:
confusionMatrix(test$predicted, test$cyl)

Confusion Matrix and Statistics

            Reference
Prediction   Four Cyl. Six Cyl. Eight Cyl.
  Four Cyl.          3        0          0
  Six Cyl.           0        2          0
  Eight Cyl.         0        0          4

Overall Statistics
                                     
               Accuracy : 1          
                 95% CI : (0.6637, 1)
    No Information Rate : 0.4444     
    P-Value [Acc > NIR] : 0.0006766  
                                     
                  Kappa : 1          
 Mcnemar's Test P-Value : NA         

Statistics by Class:

                     Class: Four Cyl. Class: Six Cyl. Class: Eight Cyl.
Sensitivity                    1.0000          1.0000            1.0000
Specificity                    1.0000          1.0000            1.0000
Pos Pred Value                 1.0000          1.0000            1.0000
Neg Pred Value                 1.0000          1.0000            1.0000
Prevalence                     0.3333          0.2222            0