---
layout: post
title: Java ML - Smile
categories: [Java Spring]
permalink: /smile
menu: /nav/ml_teach.html
---

## What is Smile?
- SMILE stands for Statistical Machine Intelligence and Learning Engine
- Java-based ML library with fast performance and wide algorithm support

### Downloading and Importing SMILE Libraries

In [2]:
%maven com.github.haifengl:smile-data:2.6.0
%maven com.github.haifengl:smile-math:2.6.0
%maven com.github.haifengl:smile-io:2.6.0
%maven org.slf4j:slf4j-nop:2.0.7
%maven com.github.haifengl:smile-core:2.6.0

## Loading dataset

In [3]:
import smile.data.DataFrame;
import smile.data.formula.Formula;
import smile.io.Read;
import smile.classification.LogisticRegression;
import smile.data.vector.IntVector;
import org.apache.commons.csv.CSVFormat;
import smile.validation.metric.Accuracy;
import java.util.HashMap;
import java.util.Map;

// Load Wine dataset with header
String url = "https://gist.githubusercontent.com/netj/8836201/raw/iris.csv";
DataFrame iris = Read.csv(url, CSVFormat.DEFAULT.withFirstRecordAsHeader());

System.out.println(iris.structure());
System.out.println(iris.summary());


SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


[Column: String, Type: DataType, Measure: Measure]
+------------+------+-------+
|      Column|  Type|Measure|
+------------+------+-------+
|sepal.length|double|   null|
| sepal.width|double|   null|
|petal.length|double|   null|
| petal.width|double|   null|
|     variety|String|   null|
+------------+------+-------+

[column: String, count: long, min: double, avg: double, max: double]
+------------+-----+---+--------+---+
|      column|count|min|     avg|max|
+------------+-----+---+--------+---+
|sepal.length|  150|4.3|5.843333|7.9|
| sepal.width|  150|  2|3.057333|4.4|
|petal.length|  150|  1|   3.758|6.9|
| petal.width|  150|0.1|1.199333|2.5|
+------------+-----+---+--------+---+



## Train logistic Regression with the target column as "class"

In [None]:
String[] classes = iris.stringVector("variety").toArray();

Map<String, Integer> classToInt = new HashMap<>();
int labelCounter = 0;
int[] labels = new int[classes.length];

for (int i = 0; i < classes.length; i++) {
    if (!classToInt.containsKey(classes[i])) {
        classToInt.put(classes[i], labelCounter++);
    }
    labels[i] = classToInt.get(classes[i]);
}

iris = iris.merge(IntVector.of("label", labels));

System.out.println(iris.structure());


In [None]:
// Use formula specifying label as target, excluding the string column 'variety' as a feature
// So drop "variety" column before fitting model
DataFrame features = iris.drop("variety");

Formula formula = Formula.lhs("label");
LogisticRegression model = LogisticRegression.fit(formula, features);

System.out.println("Model trained.");


In [None]:
DataFrame features = iris.drop("variety").drop("label");

double[] sample = new double[features.ncols()];
for (int i = 0; i < features.ncols(); i++) {
    sample[i] = features.getDouble(0, i);
}

int pred = model.predict(sample);


In [None]:
DataFrame features = iris.drop("variety").drop("label");

int[] trueLabels = iris.intVector("label").toIntArray();
int[] predictedLabels = new int[iris.size()];

for (int i = 0; i < iris.size(); i++) {
    double[] x = new double[features.ncols()];
    for (int j = 0; j < features.ncols(); j++) {
        x[j] = features.getDouble(i, j);
    }
    predictedLabels[i] = model.predict(x);
}

double accuracy = Accuracy.of(trueLabels, predictedLabels);
System.out.printf("Training Accuracy: %.2f%%\n", accuracy * 100);
