## How to Setup Java in Google Colab Notebook


You can set up Java to work in a Jupyter notebook, including Google Colab. Read this short article here on setting up IJava in a notebook: 

https://medium.com/@gmsharpe/jupyter-java-and-google-colab-7a2f7fb08808

This script below will initialize the Java environment. Be sure to click top-right menu, "Connect to a hosted runtime," and select "java." 

In [1]:
%%bash
#!/usr/bin/env bash

echo "Update environment..."
apt update -q  &> /dev/null 

echo "Install Java..." 
apt-get install -q openjdk-11-jdk-headless &> /dev/null

echo "Install Jupyter java kernel..."
curl -L https://github.com/SpencerPark/IJava/releases/download/v1.3.0/ijava-1.3.0.zip \
 -o ijava-kernel.zip &> /dev/null

unzip -q ijava-kernel.zip -d ijava-kernel \
 && cd ijava-kernel \
 && python3 install.py --sys-prefix &> /dev/null
 
echo "Install proxy for the java kernel"
# NOTE: required after changes to Google Colab defaults in Dec. 2022
# See https://stackoverflow.com/questions/74674688/google-colab-notebook-using-ijava-stuck-at-connecting-after-installation-ref/74821762#74821762

wget -qO- https://gist.github.com/SpencerPark/e2732061ad19c1afa4a33a58cb8f18a9/archive/b6cff2bf09b6832344e576ea1e4731f0fb3df10c.tar.gz | tar xvz --strip-components=1
python install_ipc_proxy_kernel.py --kernel=java --implementation=ipc_proxy_kernel.py

Update environment...
Install Java...
Install Jupyter java kernel...
Install proxy for the java kernel
e2732061ad19c1afa4a33a58cb8f18a9-b6cff2bf09b6832344e576ea1e4731f0fb3df10c/install_ipc_proxy_kernel.py
e2732061ad19c1afa4a33a58cb8f18a9-b6cff2bf09b6832344e576ea1e4731f0fb3df10c/ipc_proxy_kernel.py
Moving java kernel from /usr/share/jupyter/kernels/java...
Wrote modified kernel.json for java_tcp in /usr/share/jupyter/kernels/java_tcp/kernel.json
Installing the proxy kernel in place of java in /usr/share/jupyter/kernels/java
Installed proxy kernelspec: {"argv": ["/usr/bin/python3", "/usr/share/jupyter/kernels/java/ipc_proxy_kernel.py", "{connection_file}", "--kernel=java_tcp"], "env": {}, "display_name": "Java", "language": "java", "interrupt_mode": "message", "metadata": {}}
Proxy kernel installed. Go to 'Runtime > Change runtime type' and select 'java'


You should now have your Java environment set up! 

In [3]:
System.out.println("Hello IJava")

Hello IJava


Use these magic commands to install Java library dependencies. 

In [5]:
%maven org.knowm.xchart:xchart:3.5.2
%maven com.github.haifengl:smile-core:3.0.1
%maven org.apache.commons:commons-io:1.3.2
%maven org.apache.commons:commons-csv:1.10.0
%maven org.apache.commons:commons-math3:3.6.1

In [10]:
import org.apache.commons.math3.stat.regression.SimpleRegression;
import org.apache.commons.io.FileUtils;
import java.util.Arrays;
import org.apache.commons.math3.stat.regression.OLSMultipleLinearRegression;
import smile.io.*;
import smile.classification.*;
import smile.data.formula.Formula;
import org.apache.commons.csv.CSVFormat;

## Linear Regression

Let's bring in a small dataset and perform a simple linear regression. First download and load the dataset into a SMILE Dataframe.

In [12]:
var url = "https://raw.githubusercontent.com/thomasnield/machine-learning-demo-data/master/regression/single_independent_variable_linear_small.csv";

FileUtils.copyURLToFile(new URL(url), new File("linear_regression_data.csv"));

var df = Read.csv("linear_regression_data.csv", CSVFormat.DEFAULT.withHeader());
df;

SLF4J: No SLF4J providers were found.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See https://www.slf4j.org/codes.html#noProviders for further details.


[x: int, y: int]
+---+---+
|  x|  y|
+---+---+
|  1|  5|
|  2| 10|
|  3| 10|
|  4| 15|
|  5| 14|
|  6| 15|
|  7| 19|
|  8| 18|
|  9| 25|
| 10| 23|
+---+---+


Next let's use Apache Commons Math to do a simple linear regression. We can get both the slope and the intercept after fitting to the data. 

In [14]:
SimpleRegression regression = new SimpleRegression(true); // pass true to include intercept

var data = df.toArray("x","y");

regression.addData(data);

System.out.println(regression.getIntercept());
// displays intercept of regression line

System.out.println(regression.getSlope());
// displays slope of regression line

4.7333333333333325
1.9393939393939394


> To learn how to chart in a Jupyter notebook with Java, check out XChart here: 
https://github.com/knowm/XChart

## Multiple Linear Regrssion

Let's next use a multiple linear regression to a dataset with three columns, where `x1` and `x2` are input variables and `y` is the output variable. 

In [16]:
var url = "https://raw.githubusercontent.com/thomasnield/machine-learning-demo-data/master/regression/multiple_independent_variable_linear.csv";
FileUtils.copyURLToFile(new URL(url), new File("multiple_independent_variable_linear.csv"));

var df = Read.csv("multiple_independent_variable_linear.csv", CSVFormat.DEFAULT.withHeader());
df;

[x1: int, x2: int, y: int]
+---+---+---+
| x1| x2|  y|
+---+---+---+
|  0| 22| 88|
|  1| 13| 62|
|  1| 15| 67|
|  1| 14| 62|
|  2| 18| 77|
|  2| 13| 65|
|  2| 11| 56|
|  3|  2| 34|
|  3| 13| 66|
|  4| 16| 75|
+---+---+---+
59 more rows...


Let's use the `OLSMultipleLinearRegression` to fit `x1` and `x2` as inputs to a function that outputs `y`. We need to solve for $ \beta_0 $, $ \beta_1 $, and $ \beta_2 $ which are the intercept and slopes respectively. 

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2
$$

We can then use Apache Commons math to solve for the coefficients  $ \beta_0 $, $ \beta_1 $, and $ \beta_2 $. 

In [18]:
var inputData = df.toArray("x1","x2");
var outputData = df.intVector("y").toDoubleArray();

// pass data to multiple linear regression
var multiRegression = new OLSMultipleLinearRegression();

multiRegression.newSampleData(outputData, inputData);

// estimate the coefficients 
var betas = multiRegression.estimateRegressionParameters();

// print out coefficients for b0, b1, and b2
System.out.println(Arrays.toString(betas));

[20.109432820036012, 2.0067264725128053, 3.0020379766466903]


This results in the following fitted function: 

$$
y = 20.1094 + 2.0067x_1 + 3.002x_2
$$

## Logistic Regression

In [20]:
var url = "https://raw.githubusercontent.com/thomasnield/machine-learning-demo-data/master/classification/simple_logistic_regression.csv";
FileUtils.copyURLToFile(new URL(url), new File("data.csv"));

var df = Read.csv("data.csv", CSVFormat.DEFAULT.withHeader());
df;

[x: double, y: int]
+---+---+
|  x|  y|
+---+---+
|  1|  0|
|1.5|  0|
|2.1|  0|
|2.4|  0|
|2.5|  1|
|3.1|  0|
|4.2|  0|
|4.4|  1|
|4.6|  1|
|4.9|  0|
+---+---+
11 more rows...


In [22]:
// extract out input and output columns 
var x = df.toArray("x");
var y = df.intVector("y").array();

var model = LogisticRegression.fit(x, y);

// What's the prediction of showing symptoms after 11.2 hours of exposure? 
double[][] test_input = {{11.2}};

model.predict(test_input)[0];

1

## Multiple Logistic Regression

Let's bring in a dataset of 1,335 background colors and whether that background looks better with a light (0) or dark (1) font. 

In [24]:
var url = "https://raw.githubusercontent.com/thomasnield/machine-learning-demo-data/master/classification/light_dark_font_training_set.csv";
FileUtils.copyURLToFile(new URL(url), new File("light_dark_font_training_set.csv"));

var df = Read.csv("light_dark_font_training_set.csv", CSVFormat.DEFAULT.withHeader());
df

[RED: int, GREEN: int, BLUE: int, LIGHT_OR_DARK_FONT_IND: int]
+---+-----+----+----------------------+
|RED|GREEN|BLUE|LIGHT_OR_DARK_FONT_IND|
+---+-----+----+----------------------+
|  0|    0|   0|                     0|
|  0|    0| 128|                     0|
|  0|    0| 139|                     0|
|  0|    0| 205|                     0|
|  0|    0| 238|                     0|
|  0|    0| 255|                     0|
|  0|  100|   0|                     0|
|  0|  104| 139|                     0|
|  0|  128|   0|                     0|
|  0|  128| 128|                     0|
+---+-----+----+----------------------+
1335 more rows...


We can use a logistic regression in the same manner. Let's even test some colors of our own using their R,G,B values. 

In [None]:
// extract out input and output columns 
var x = df.toArray("RED", "GREEN", "BLUE");
var y = df.intVector("LIGHT_OR_DARK_FONT_IND").array();

var model = LogisticRegression.fit(x, y);

// Test a given RGB background color for LIGHT (1) or DARK (0)
double[][] testInput = {{255,255,255}};

model.predict(testInput)[0];

1

## Decision Trees

In [None]:
var url = "https://raw.githubusercontent.com/thomasnield/machine-learning-demo-data/master/classification/good_weather_classification.csv";
FileUtils.copyURLToFile(new URL(url), new File("good_weather_classification.csv"));

var df = Read.csv("good_weather_classification.csv", CSVFormat.DEFAULT.withHeader());
df

[RAIN: int, LIGHTNING: int, CLOUDY: int, TEMPERATURE: int, GOOD_WEATHER_IND: int]
+----+---------+------+-----------+----------------+
|RAIN|LIGHTNING|CLOUDY|TEMPERATURE|GOOD_WEATHER_IND|
+----+---------+------+-----------+----------------+
|   0|        1|     1|         74|               0|
|   0|        0|     0|         69|               1|
|   1|        0|     1|         58|               0|
|   0|        0|     0|         71|               1|
|   0|        0|     0|         73|               1|
|   0|        1|     1|         80|               0|
|   0|        1|     1|         74|               0|
|   0|        0|     0|         73|               1|
|   1|        0|     1|         79|               0|
|   0|        0|     1|         72|               1|
+----+---------+------+-----------+----------------+
40 more rows...


In [None]:
import smile.classification.DecisionTree;
import smile.data.formula.Formula;
import smile.data.Tuple;

var formula = Formula.lhs("GOOD_WEATHER_IND");

var decisionTree = DecisionTree.fit(formula, df);



Double[] testInput = {0.0, 0.0, 0.0, 72.0};


decisionTree.predict(Tuple.of(testInput));

CompilationException: ignored