# Data Analysis with Rust Notebooks

These is my "log" learning Dr. Shahin Rostami's (2020-02-29) introduction to do [Data Analysis with Rust Notebooks](https://datacrayon.com/posts/programming/rust-notebooks/multidimensional-arrays-and-operations-with-ndarray/) by Data Crayon mainly with help of crate [`ndarray`](https://docs.rs/ndarray/0.15.4/ndarray/) and crate [plotly](https://docs.rs/plotly/0.7.0/plotly/).

My notes when studying [`ndarray`](http://localhost:8888/notebooks/rust/rust_notes.ipynb#Multidimensional-Arrays-and-Operations-with-NDArray) as well as [`plotly`](rust_notes.ipynb#Plotting-with-Plotly) are done in [notebook Rust Notes](rust_notes.ipynb).


### NDArray Reading (from CSV)

Next step when we follow [Mr. Rostami](https://datacrayon.com/posts/programming/rust-notebooks/loading-datasets-from-csv-into-ndarray/) is to read data from a CSV file utilizing a real world dataset, the [Iris Data Set](http://archive.ics.uci.edu/ml/datasets/Iris), "the best known database to be found in the pattern recognition literature".

Let us download the file provided by Mr. Rostami.

Even thow type `std::fs::File` implements [trait `std::io::Write`](https://doc.rust-lang.org/std/fs/struct.File.html#impl-Write) the compiler cannot de-sugar the [method call expression](https://doc.rust-lang.org/reference/expressions/method-call-expr.html?highlight=dot#method-call-expressions) `file.write_all(res.as_bytes())` without a `use` declaration, the trait is not in scope, which "is the region of source text where a named entity may be referenced with that name.".

In [12]:
:dep ureq = {version = "0.11.4"}

let res = ureq::get("https://datacrayon.com/datasets/Iris.csv").call().into_string()?;
println!("The length of the dataset string {} \
          and print the first 275 characters\n\n{}"
    , res.len()
    , &res[..275]);

let mut file = std::fs::File::create("Iris.csv")?;
//file.write_all(res.as_bytes())
std::io::Write::write_all(&mut file, res.as_bytes())

The length of the dataset string 5107 and print the first 275 characters

Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
1,5.1,3.5,1.4,0.2,Iris-setosa
2,4.9,3.0,1.4,0.2,Iris-setosa
3,4.7,3.2,1.3,0.2,Iris-setosa
4,4.6,3.1,1.5,0.2,Iris-setosa
5,5.0,3.6,1.4,0.2,Iris-setosa
6,5.4,3.9,1.7,0.4,Iris-setosa
7,4.6,3.4,1.4,0.3,Iris-setosa



Ok(())

Crate `ndarray-csv` has two traits [`Array2Reader`](https://docs.rs/ndarray-csv/0.5.1/ndarray_csv/trait.Array2Reader.html) and [`Array2Writer`](https://docs.rs/ndarray-csv/0.5.1/ndarray_csv/trait.Array2Writer.html) connecting a 2 dimensional `ndarray::Array2` with a [`csv::Reader`](https://docs.rs/csv/1.1.6/csv/struct.Reader.html) as well as with a [`csv::Writer`](https://docs.rs/csv/1.1.6/csv/struct.Writer.html) respectivily through crate `serde`.

In [13]:
:dep csv = {version = "1.1"}
:dep ndarray = {version = "0.13.1"}
:dep ndarray-csv = {version = "0.4.1"}
:dep darn = {version = "0.3.0"}

let mut csv_rdr = csv::Reader::from_path("Iris.csv")?;

let mut csv_headers : Vec<String> = Vec::<String>::new();
for element in csv_rdr.headers().unwrap().into_iter() {
        csv_headers.push(String::from(element));
};

let csv_data: ndarray::Array2<String> = 
               ndarray_csv::Array2Reader::deserialize_array2_dynamic(&mut csv_rdr)?;

darn::show_frame(&csv_data, Some(&csv_headers));

Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
"""1""","""5.1""","""3.5""","""1.4""","""0.2""","""Iris-setosa"""
"""2""","""4.9""","""3.0""","""1.4""","""0.2""","""Iris-setosa"""
"""3""","""4.7""","""3.2""","""1.3""","""0.2""","""Iris-setosa"""
"""4""","""4.6""","""3.1""","""1.5""","""0.2""","""Iris-setosa"""
"""5""","""5.0""","""3.6""","""1.4""","""0.2""","""Iris-setosa"""
...,...,...,...,...,...
"""146""","""6.7""","""3.0""","""5.2""","""2.3""","""Iris-virginica"""
"""147""","""6.3""","""2.5""","""5.0""","""1.9""","""Iris-virginica"""
"""148""","""6.5""","""3.0""","""5.2""","""2.0""","""Iris-virginica"""
"""149""","""6.2""","""3.4""","""5.4""","""2.3""","""Iris-virginica"""


### NDArray Applying

Declare dependencies. 

In [14]:
:dep csv = {version = "1.1"}
:dep ndarray = {version = "0.13.1"}
:dep ndarray-csv = {version = "0.4.1"}
:dep darn = {version = "0.3.0"}
:dep plotly = { version = ">=0.7.0" }


Read data from file.

In [15]:
let mut rdr = csv::Reader::from_path("Iris.csv")?;

let mut headers : Vec<String> = Vec::<String>::new();
for element in rdr.headers().unwrap().into_iter() {
        headers.push(String::from(element));
};

let iris_data: ndarray::Array2<String> = 
               ndarray_csv::Array2Reader::deserialize_array2_dynamic(&mut rdr)?;

darn::show_frame(&iris_data, Some(&headers));

Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
"""1""","""5.1""","""3.5""","""1.4""","""0.2""","""Iris-setosa"""
"""2""","""4.9""","""3.0""","""1.4""","""0.2""","""Iris-setosa"""
"""3""","""4.7""","""3.2""","""1.3""","""0.2""","""Iris-setosa"""
"""4""","""4.6""","""3.1""","""1.5""","""0.2""","""Iris-setosa"""
"""5""","""5.0""","""3.6""","""1.4""","""0.2""","""Iris-setosa"""
...,...,...,...,...,...
"""146""","""6.7""","""3.0""","""5.2""","""2.3""","""Iris-virginica"""
"""147""","""6.3""","""2.5""","""5.0""","""1.9""","""Iris-virginica"""
"""148""","""6.5""","""3.0""","""5.2""","""2.0""","""Iris-virginica"""
"""149""","""6.2""","""3.4""","""5.4""","""2.3""","""Iris-virginica"""


Create `f32`data set from CSV's string data

In [16]:
let iris_features : ndarray::Array2<f32> = 
    iris_data.slice(ndarray::s![..,1..(4+1)]).mapv(
      |elem: String| -> f32 {
        <f32 as std::str::FromStr>::from_str(&elem).unwrap()
      }
    );

let headers_feature : Vec<String> = headers[1..(4+1)].to_vec();

darn::show_frame(&iris_features, Some(&headers_feature));

SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2
4.7,3.2,1.3,0.2
4.6,3.1,1.5,0.2
5.0,3.6,1.4,0.2
...,...,...,...
6.7,3.0,5.2,2.3
6.3,2.5,5.0,1.9
6.5,3.0,5.2,2.0
6.2,3.4,5.4,2.3


Display data set

In [17]:
let layout = plotly::Layout::new()
    .x_axis(plotly::layout::Axis::new()
            .title(plotly::common::Title::new("Length (cm)")))
    .y_axis(plotly::layout::Axis::new()
            .title(plotly::common::Title::new("Width (cm)")));

let sepal = plotly::Scatter::new(iris_features.column(0).to_vec(), 
                                 iris_features.column(1).to_vec())
    .name("Sepal")
    .mode(plotly::common::Mode::Markers);
let petal = plotly::Scatter::new(iris_features.column(2).to_vec(), 
                                 iris_features.column(3).to_vec())
    .name("Petal")
    .mode(plotly::common::Mode::Markers);

let mut plot = plotly::Plot::new();

plot.set_layout(layout);
plot.add_trace(sepal);
plot.add_trace(petal);

plot.notebook_display();

For sake of completeness lets extract the labels of data sample, each row

In [18]:
let labels_feature: ndarray::Array1<String> = 
                    iris_data.slice(ndarray::s![..,5])
                             .to_owned(); // Return an uniquely owned copy of the array.

assert_eq!(iris_features.len_of(ndarray::Axis(0)),
           labels_feature.len_of(ndarray::Axis(0)))

()