Copyright 2023 RISC Zero, Inc.

 Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.

The following notebook is meant to serve as a guide for training classifiers and regression models using the SmartCore crate.  Prior to training the classifier in Rust, the data should be processed in Python.  The data and classes should be exported as seperate CSV files.

Start by importing the Smartcore and Polars crates as dependencies.  Outside of a jupyter notebook environment, you can add these to your cargo.toml file or use cargo add "CRATE-NAME" in the command line.

Be sure to include serde as a feature for the smartcore crate, otherwise the Smartcore CSV readers will not work.

In [None]:
:dep smartcore = {version = "0.3.2", features = ["serde"]}
:dep polars = "*"
:dep serde_json = "1.0"
:dep rmp-serde = "1.1.2"
:dep ndarray = "0.15"

In [None]:
use smartcore::linalg::basic::matrix::DenseMatrix;
use smartcore::ensemble::random_forest_regressor::*;
use smartcore::readers;
use ndarray::array;

use std::fs::File;
use std::io::{Read, Write};
use polars::prelude::*;
use serde_json;
use rmp_serde;

In [None]:
// Example data
let x = DenseMatrix::from_2d_array(&[
    &[1.0, 1.0],
    &[1.0, 2.0],
    &[2.0, 2.0],
    &[2.0, 3.0],
    &[3.0, 3.0],
    &[4.0, 4.0],
    &[6.0, 8.0],
]);
let y = array![6,8,9,11,12,15,25];

let y_vec_i64: Vec<i64> = y.to_vec();

let y_vec_u32: Vec<u32> = y_vec_i64.iter().map(|x| *x as u32).collect();

Now, we can train the model using our desired classifier.  

In [None]:
let params = RandomForestRegressorParameters::default().with_n_trees(1).with_m(42);

let model = RandomForestRegressor::fit(&x, &y_vec_u32, params).unwrap();

We call predict() on the model in order to perform inference.

In [None]:
// Create DenseMatrix from the first element in the input array
let input = DenseMatrix::from_2d_array(
    &[
        &[1.0,1.0],
    ]
);

model.predict(
    &input
).unwrap()

Model training can be performed in the host code, but you can also import a serialized pre-trained model from a JSON, YAML, or ProtoBuf file.  

The code below let's you export the trained model and the input data as serialized JSON files which can be imported into the host.

For use in the ZKVM, serializing the model and input data as a byte array is ideal.  The code below exports the trained model and input data as byte arrays in JSON files.

In [None]:
let model_bytes = rmp_serde::to_vec(&model).unwrap();
let data_bytes = rmp_serde::to_vec(&input).unwrap();

let model_json = serde_json::to_string(&model_bytes)?;
let x_json = serde_json::to_string(&data_bytes)?;

let mut f = File::create("../../res/ml-model/te_regression_model_bytes.json").expect("unable to create file");
f.write_all(model_json.as_bytes()).expect("Unable to write data");

let mut f1 = File::create("../../res/input-data/te_regression_data_bytes.json").expect("unable to create file");
f1.write_all(x_json.as_bytes()).expect("Unable to write data");